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The  economy  consists  of  various  dynamic  activities  whose 
interrelationships  vary  over  time.  There  have  been  many 
attempts  to  account  for  the  variations  of  stock  returns  with 
the  movements  of  those  dynamic  activities.  These  activities 
include  the  growth  rates  of  gross  national  products,  changes 
in  industrial  production,  inflation  rates,  short-  and  long- 
term interest  rates,  and  yields  on  corporate  bonds.  Those 
time-varying  relationships  with  stock  returns  can  be 
represented  by  the  coefficients  in  stock  return  generating 
equations.  Most  of  the  previous  work,  however,  relies  on  the 
very  restrictive  assumption  that  the  parameters  in  the 
equations  remain  constant  over  time.  This  restrictive 
assumption  leads  to  the  low  explanatory  power  of  the  models  on 
stock  returns.  In  this  work,  I construct  a more  general 

model  of  stock  returns  using  the  Kalman  filter  which  allows 
the  parameters  to  change  over  time.  In  addition,  I also 
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generalize  an  assumption  on  error  terms  by  letting  the 
variance-covariance  matrices  of  the  disturbance  terms  in  both 
measurement  and  transition  eguations  follow  autoregressive 
conditional  heteroskedasticity . 

Before  performing  empirical  work  on  the  model,  I check 
whether  a general  error  distribution  (GED) , employed  in 
Nelson's  work,  or  a normal  distribution  fits  the  distribution 
of  actual  stock  returns  better  and  find  that  the  performance 
of  the  GED  distribution  is  almost  indistinguishable  from  that 
of  the  normal  distribution. 

Under  the  assumption  that  the  normal  distribution  fits  the 
disturbance  terms  in  the  model,  the  null  hypothesis  that 
parameters  remain  constant  is  rejected  at  the  5 percent  level 
of  significance.  I also  find  little  evidence  that  stock 
markets  are  informationally  inefficient. 

Based  on  the  results  of  estimation  of  the  model,  I find 
that  the  time-varying  parameter  model  yields  better 
explanatory  power  over  alternative  models  of  stock  returns. 
I re-examine  features  of  U.S.  monthly  stock  returns. 
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CHAPTER  1 
INTRODUCTION 


So  far  there  have  been  a great  number  of  studies  on  how 
stock  prices  or  stock  returns  are  determined  and  how  risks  are 
priced.  The  capital  asset  pricing  model  (CAPM) , introduced  by 
Sharpe  (1964),  Litner  (1965)  and  Mossin  (1966),  has  been  a 
major  framework  for  analyzing  financial  markets  for  a long 
time.  It  states  that  expected  returns  from  holding  securities 
should  be  linearly  related  to  the  so-called  market  beta.  But 
the  CAPM  uses  assumptions  that  are  too  strong  to  be  testable. 

As  a testable  alternative  to  the  CAPM,  the  arbitrage 
pricing  theory  (APT)  has  been  proposed  by  Ross  (1976)  and 
developed  by  Roll  (1977,  1981),  Roll  and  Ross  (1980),  Connor 
(1980),  Reinganum  (1981),  Shanken  (1982),  Chamberlain  (1983), 
Chamberlain  and  Rothschild  (1983),  Chen  (1984),  Dhrymes, 
Friend  and  Gultekin  (1984,  1985),  Wei  (1988),  Mei  (1989), 
Sawyer  (1989) , and  others.  The  APT  states  that  in  equilibrium 
expected  returns  from  holding  securities  should  be  linear 
combinations  of  factor  loadings.  In  the  APT  models,  however, 
explanatory  factors  are  neither  explicitly  identified  nor  is 
the  uniqueness  of  solutions,  obtained  using  the  factor 
analysis  on  the  covariance  matrix  of  stock  returns, 
guaranteed. 
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Recently,  many  financial  economists  such  as  Black  and 
Fischer  (1976) , Bollerslev  (1986) , French,  Schwert  and 
Stambaugh  (1987),  Nelson (1988a,  b,  c,  1989),  Schwert  (1989a, 
b,  c) , LeBaron  (1990),  Merville  & Pieptea  (1989),  and  Kearns 
and  Pagan  (1990)  have  attempted  to  relate  the  volatility  of 
stock  returns  to  the  volatilities  of  major  economic  variables 
such  as  real  GNP  growth  rate,  the  inflation  rate,  growth  rates 
of  industrial  production,  financial  leverage,  relative  price 
variability  and  dividend  yields.  Many  of  them  also  implicitly 
or  explicitly  assume  that  stock  returns  are  determined  by  the 
volatility  of  returns.  In  most  of  the  previous  work  on 
volatility,  stock  volatility  is  measured  by  the  variance  of 
the  error  terms  in  models  of  stock  returns.  Note  that  the 
error  term  in  a specific  model  is  the  part  unexplained  by  the 
model.  Thus  if,  besides  the  variance  of  stock  returns  such 
variables  as  described  above  are  directly  incorporated  into 
the  model  of  stock  return,  then  the  explanatory  power  of  the 
model  will  be  increased  as  long  as  they  have  some  explanatory 
power  over  the  market  variance  of  stock  returns. 

In  addition,  most  of  the  previous  studies  on  stock  returns 
mentioned  above  are  static  in  the  sense  that  they  assume  the 
constancy  of  the  coefficients  which  relate  the  stock  returns 
or  the  volatilities  of  the  stock  returns  to  explanatory 
economic  variables.  But  this  assumption  appears  to  be  less 
reasonable  if  we  take  into  account  the  fact  that  the  economy 
is  a collection  of  dynamic  activities  whose  relations  vary 
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constantly. 

Many  of  the  studies  on  stock  returns,  like  those  by  Engle 
and  Bollerslev  (1986),  Kearns  and  Pagan  (1990),  Bollerslev, 
Engle  & Wooldridge  (1987),  Bollerslev  (1988),  employ  the 
linear  GARCH  model  with  normally  distributed  error  terms.  But 
Nelson  (1989)  shows  that  the  distribution  of  stock  returns 
tends  to  have  fatter  tails  than  the  normal  distribution.  And 
to  ensure  nonnegativity  of  the  variance  these  studies  also 
impose  very  restrictive  assumptions  that  all  the  coefficients 
in  the  return  variance  formulas  are  nonnegative.  This  rules 
out  the  possibility  of  oscillatory  behavior  of  the  return 
variance.  Next,  the  GARCH  models  assume  that  the  distribution 
of  stock  returns  is  symmetric  so  it  cannot  explain  the 
negative  skewness  of  stock  returns  or  the  tendency  for  market 
volatility  to  rise  after  the  market  declines  (Campbell  and 
Hentschel,  1990).  Third,  in  conventional  GARCH  models,  the 
current  is  determined  by  a linear  combination  of  lagged 
i,  1 = 1,2, ...,p,  and  squared  terms  of  current  and  lagged 
errors.  Thus  it  does  not  reflect  whether  unanticipated  excess 
returns  are  positive  or  negative.  But  Schwert  (1989a) , 
Bollerslev  (1987)  and  Nelson  (1988a,  1989)  show  that  stock 

returns  are  negatively  correlated  with  changes  in  volatility 
of  returns.  Lastly,  the  GARCH  models,  in  general,  do  not 
highlight  the  economic  theory  that  explains  the  behavior  of 
stock  returns  in  equilibrium. 

In  this  paper,  using  the  Kalman  filter,  I build  a more 
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dynamic  model  in  which  the  coefficients  of  the  return 
generating  structure  and  their  conditional  covariances  are 
allowed  to  vary  over  time.  In  addition,  the  variances  of 
disturbances  in  the  state  space  model,  conditional  on  the  past 
observations,  are  assumed  to  follow  ARCH(l)  processes.  This 
type  of  model  can  account  for  phenomena  in  the  dynamic  economy 
in  a more  proper  way  than  the  static  models  can.  Especially, 
in  this  work,  the  covariance  matrix  of  disturbances  in  a 
transition  equation,  conditional  on  past  covariance  matrix,  is 
determined  by  a linear  combination  of  a matrix  of  constants 
and  a cross  product  of  past  disturbance  term  given  by  the  Eg 
(4.1.5).  It  will  be  called  a matrix  ARCH  (MARCH).  There  has 
been  little  work  on  a MARCH  model  in  econometrics  so  this 
paper  can  be  regarded  as  a frontier  in  this  area.  And  most  of 
the  previous  work,  such  as  Harvey  and  Ruiz  (1990)  , which 
applies  a Kalman  filter  to  stock  return  or  price  data  assumes 
the  case  where  there  is  only  one  explanatory  variable  and  the 
transition  coefficient  is  already  known  to  be  equal  to  one. 
But  these  are  very  restrictive  assumptions  in  the  sense  that, 
as  can  be  seen  in  Chapter  2,  there  are  several  other  variables 
besides  one-lagged  stock  return,  for  example,  real  GNP, 
industrial  production,  investment,  inflation  rate,  yield  on 
bonds,  short-term  interest  rate,  and  so  on,  which  have  some 
explanatory  power  over  stock  returns;  moreover,  the 
coefficients  on  those  variables  are  seldom  equal  to  one.  So 
in  this  work  I am  going  to  construct  a more  general  model  in 
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which  some  of  those  macroeconomic  variables  described  above 
are  used  as  explanatory  variables  and  the  transition  matrix  of 
the  parameters  is  not  restricted  to  being  equal  to  an  identity 
matrix. 

The  structure  of  this  work  is  as  follows.  In  Chapter  2,  I 
summarize  the  characteristics  of  the  U.S.  stock  returns: 
volatility  of  stock  returns,  relationship  of  volatility  to 
economic  activity,  and  behavior  of  stock  returns,  which  have 
been  found  in  the  previous  work.  In  Chapter  3,  the 
performance  when  stock  returns  are  assumed  to  have  a general 
error  distribution  will  be  compared  with  that  when  stock 
returns  are  assumed  to  have  a Gaussian  distribution.  The 
comparison  between  the  two  distributions  will  be  performed  by 
a naive  estimator  and  a kernel  estimator  of  density,  which 
will  be  estimated  with  new  methods.  In  Chapter  4,  I construct 
a more  dynamic  model  in  terms  of  a combination  of  a 
measurement  equation  and  a transition  equation  in  which 
disturbance  terms  conditional  on  the  past  disturbances  are 
normally  distributed  with  means  zero  and  variances  following 
an  ARCH(l)  process.  Then  I am  going  to  derive  the  Kalman 
filter  of  the  model  in  which  the  variances  of  the  disturbance 
terms  follow  an  ARCH  process  and  then  present  the  procedures 
of  mathematical  optimization  of  the  unknown  parameters  in  the 
model  when  using  a maximum  likelihood  estimation  method.  In 
Chapter  5,  I will  estimate  a simpler  version  of  model  in  which 
the  variances  of  error  terms  do  not  follow  an  ARCH  process 
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since  estimation  of  the  more  general  version  involves  very 
high  computational  cost.  Also,  I will  compare  the 
performance  of  the  model  in  this  work  with  those  of 
alternative  models  of  stock  returns  and  then  perform  tests  on 
parameter  constancy  and  stock  market  efficiency.  In  Chapter 
6,  based  on  the  results  of  estimation  in  the  previous 
chapter,  I will  attempt  to  reinterpret  some  of  the  important 
features  of  stock  returns  found  in  Chapter  2.  Chapter  7 will 
presents  some  conclusions. 


CHAPTER  2 

CHARACTERISTICS  OF  THE  U.S.  STOCK  RETURNS 

In  this  chapter,  I am  going  to  summarize  the  results  of  the 
previous  work  on  the  behavior  of  stock  returns.  Some  of  them 
will  be  reinterpreted  in  terms  of  the  outcomes  obtained  from 
our  model . 


2.1  Volatility  of  Stock  Returns 

First,  the  volatility  of  returns  of  stocks  and  bonds  varies 
over  time  (French,  Schwert,  and  Stambaugh,  1987;  Schwert, 
1989a;  Merville  and  Pieptea,  1989;  Turner,  Startz  and  Nelson, 
1989) . In  particular,  the  volatility  tends  to  be  generally 
very  high  during  war,  economic  recession,  oil  shocks,  and 
banking  or  financial  panic  (Schwert,  1989a) . But  strong 
elastic  forces  act  on  the  stock  volatility  around  its  long- 
term value  (Merville  and  Pieptea,  1989) . 

Second,  shocks  to  the  volatility  of  stock  returns  are 
persistent  (Nelson,  1989) . It  follows  that  volatilities  are 
highly  autocorrelated  so  lagged  volatility  of  returns  has  more 
predictive  power  on  contemporaneous  volatility  than  other 
variables  such  as  lagged  volatility  of  inflation  (PPI) , bond 
returns,  interest  rates,  and  monetary  base  (Schwert,  1989a) . 
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In  other  words,  autocorrelation  decays  very  slowly  even  though 
autocorrelation  p(CTt+s/  cTt)  is  a decreasing  function  of  the 
time  lags  (Merville  and  Pieptea,  1989)  . It  follows  that 
shocks  have  both  permanent  and  transitory  components  (Schwert, 
1989b) . 

Third,  in  the  short  run,  variation  from  the  mean  is 
primarily  determined  by  transitory  movements,  while  in  the 
long  run  variation  from  the  mean  is  primarily  determined  by 
both  permanent  and  transitory  movements  (Merville  and  Pieptea, 
1989) . 

Fourth,  a non-trading  day  contributes  much  less  to 
volatility  than  a trading  day  (French  and  Roll,  1986;  Nelson, 
1989d) . Trading  activity  does  not  explain  much  of  the 
variation  in  volatility  through  time  but  trading  volume  does 
(Schwert,  1989a) . 

Fifth,  the  volatility  of  stock  returns  is  higher  than  that 
of  bond  returns  and  interest  rates  and  the  estimates  of 
volatility  from  daily  data  have  much  less  error  than  those 
from  monthly  data  (Schwert,  1989a) . 

2--2 — Relationship  of  Volatility  to  Economic  Activity 

First,  expected  returns  vary  over  time  (French,  Schwert  & 
Stambaugh,  1987;  Campbell  & Shiller,  1988;  Fama  & French, 
1988a,  b;  Turner,  Startz  & Nelson,  1989;  Ball  & Kothari, 
1989) . The  negative  autocorrelation  of  stock  returns  can  be 
explained  largely  by  the  time-varying  expected  returns,  which 


9 


in  turn  are  caused  by  variation  in  expected  returns  on  the 
market  portfolio,  in  relative  risks  of  firms 's  investments, 
and  in  leverage  (Ball  & Kothari,  1989) . On  the  other  hand, 
French,  Schwert,  and  Stambaugh  (1987)  showed  that  excess 
returns  are  positively  related  to  the  predictable  volatility 
of  stock  returns  while  they  are  negatively  related  to  the 
unexpected  change  in  the  volatility  of  stock  returns. 

Second,  a large  fraction  of  annual  stock  return  variation 
can  be  accounted  for  by  future  real  activities  such  as  real 
GNP,  industrial  production,  and  investment  (Fama,  1990) . The 
future  growth  rate  of  industrial  production  has  a especially 
significant  and  positive  effect  on  the  current  real  stock 
returns  (Kaul  & Seyhun,  1990;  Shanken  & Weinstein,  1990). 
Fama  and  French  (1988b),  however,  found  that  earnings  and 
dividend  policy  have  relatively  low  predictive  power  on  the 
stock  return  volatility.  In  addition,  Fama  (1990)  showed  that 
real  activities  explain  more  return  variation  for  longer 
return  horizons. 

Third,  declines  in  the  stock  market  tend  to  be  associated 
with  subsequent  increases  in  volatility  (Nelson,  1989; 
Schwert,  1989a;  Turner,  Startz  & Nelson,  1989) . 

2 . 3 Behavior  of  Stock  Returns 

First,  the  distribution  of  log  stock  returns  tends  to  have 
fatter  tail  and  more  pronounced  peak  than  the  normal  (Nelson, 
1989;  Bollerslev,  1987;  Turner,  Startz  & Nelson,  1989). 
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Second,  the  distribution  of  log  stock  returns  is  negatively 
skewed,  i.e.,  large  negative  log  returns  are  more  common  than 
large  positive  returns;  see  Kearns  and  Pagan  (1990),  Campbell 
and  Hentschel  (1990) . And  the  negative  skewness  increases  with 
the  conditional  variance  of  the  stock  returns  (Campbell  & 
Hentschel,  1990). 

Third,  during  the  post-war  period,  stock  returns  are 
negatively  related  to  inflation,  to  be  more  precise,  relative 
price  variability.  It  follows  that  stock  returns  are 
positively  related  to  future  real  activity  such  as  the  growth 
rate  of  industrial  production.  The  negative  effects  of 
relative  price  variability  on  the  future  output  and  thus  on 
the  contemporaneous  real  stock  returns  are  largely  caused  by 
the  supply  shocks  in  the  1970s  (Kaul  & Seyhun,  1990) . 

Fourth,  stock  returns  are  predictable  (Fama  & French, 
1988a;  French,  Schwert  & Stambaugh,  1987) . In  particular, 
returns  are  more  predictable  for  portfolios  of  small  firms 
than  those  of  large  firms  since  in  the  small  firm  portfolio 
stationary  components  of  stock  prices  tend  to  prevail  over 
random  components  (Fama  & French,  1988a) . On  the  other  hand, 
Poterba  and  Summers  (1988)  argue  that  stock  returns  are 
positively  autocorrelated  over  short  horizons  and  negatively 
autocorrelated  over  long  horizons.  Not  surprisingly,  long- 
horizon  returns  are  more  predictable  since  the  slowly  mean- 
reverting  component  of  stock  prices  tends  to  induce  negative 
correlation  in  return  and  slow  mean  reversion  can  be  more 
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evident  in  the  long-horizon  returns  (Fama  & French,  1988a; 
Ball  & Kothari,  1989)  . Fama  and  French  (1988a)  ascribe  the 
negative  autocorrelation  of  long-horizon  returns  to  a common 
macroeconomic  phenomenon  to  firm-specific  factors. 

Fifth,  Fama  (1990)  found  that  dividend  yield,  default 
spread,  and  term  spread  are  positively  related  to  expected 
returns,  respectively.  But  the  unusually  large  realization  of 
dividend  news,  either  good  or  bad,  will  have  a negative  effect 
on  the  stock  returns  (Campbell  & Hentschel,  1990). 


CHAPTER  3 

GED  OR  NORMAL  DISTRIBUTION? 

In  his  paper  (1989),  Nelson  assumes  that  stock  returns 
follow  General  Error  Distribution  (GED) . In  other  words,  the 
probability  density  function  of  stock  returns (y)  is  given  by 

f(y)  = 

where  ^ is  the  mean  of  stock  returns,  B is  a tail-thickness 
parameter  with  its  range  -1  < B < 1,  and  0 is  a positive 
number.  If  B = 0,  then  the  stock  returns  are  normally 
distributed.  If  0 < 6 < 1,  then  the  distribution  of  y has  the 
thicker  tail  than  a normal  distribution.  If-1<B<0,  the 
distribution  of  y has  the  thinner  tail  than  a normal 
distribution. 

To  determine  which  distribution  fits  the  empirical 
distribution  better,  we  need  to  estimate  the  density  function 
of  distribution  of  stock  returns. 

3.1  Methods  for  Estimating  the  Empirical  Density 

Up  to  now,  there  have  been  many  methods  proposed  for 
estimating  the  empirical  density  function.  Histograms,  so- 
called  naive  estimators  (Fix  & Hodges,  1951;  Rosenblatt, 
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1956) , kernel  estimators,  the  nearest  neighbor  method 
(Loftsgaarden  & Quesenberry,  1965) , the  variable  kernel  method 
(Breiman,  Meisel  & Purcell,  1977),  orthogonal  series 
estimators  (Cencov,  1962) , maximum  penalized  likelihood 
estimators  (Good  & Gaskins,  1971) , general  weight  function 
estimators  (Whittle,  1958) , the  reflection  and  replication 
techniques  (Boneva,  Kendall  & Stefanov,  1971) , and  the 
transformation  techniques  (Copas  & Fryer,  1980)  are  well  known 
methods  for  density  estimation. 

3.1.1.  Histograms 

One  of  the  simplest  ways  of  estimating  the  density  function 
is  to  draw  the  histograms  of  the  sample.  To  construct  the 
histogram,  we  have  to  split  the  entire  range,  which 
sufficiently  covers  the  range  of  the  sample  data,  into  some 
suitable  number  of  bins  x by  a certain  size  of  bin  width  h. 
Then  observing  the  number  of  observations  in  each  bin  and 
using  the  following  formulation,  we  can  obtain  the  density 
estimator  : 

(no.  of  X.  e bin  x) 

Nn 

where  { X^, X2,  • • • , X^}  is  a sample  of  N real  observations  and 
f()  is  the  density  function  to  be  estimated.  Note  that  it  is 
the  choice  of  bin  width  h that  primarily  determines  the 
smoothness  of  the  histogram. 
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3.1.2.  The  Naive  Estimator  and  the  Least  Square  Estimator 
Using  Naive  Estimation  Data 

The  naive  estimator  of  density  function  is  a little  more 
improved  way  of  estimating  density  than  the  histogram  method, 
using  the  definition  of  a probability  density  function  f()  : 

Jf(x)=lim^  P{x-h<  X <x  + h)  (3.1.1) 

h~o  2h  ' 

Then  the  naive  estimator  of  density  is  given  by 

[no.  of  e (x-h,x  + h)]  (3.1.2) 

The  methods,  however,  such  as  the  histograms  and  the  naive 
estimator  have  some  drawbacks  as  below.  First,  it  is  not  a 
continuous  function  so  that  it  has  jumps  at  the  points  Xi  ± h 
and  has  zero  derivatives  everywhere  else.  Second,  the  choice 
of  bin  width  is  made  somewhat  arbitrarily  so  that  the 
smoothness  of  the  density  function  can  be  somewhat  misleading. 

In  order  to  overcome  the  problems,  instead  of  using 
directly  the  naive  estimator  given  by  the  Eg  (3.1.2)  as  the 
estimator  of  true  density,  I am  going  to  use  it  as  a kind  of 
transformed  data  which  will  be  used  for  fitting  the  true 
density  with  a Gaussian  distribution  or  a GED  distribution. 
Then  the  problems  described  above  almost  disappear.  Let  us 
investigate  which  distribution  out  of  the  GED  and  the  normal 
distribution  fits  the  empirical  distribution  better  when  the 
non-linear  least  squares  method  (NLS)  is  applied.  The  main 
idea  of  this  new  method  is  to  find  optimal  estimators  of 
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parameters  in  the  density  functions  which  minimizes  the  mean 
squared  error  (MSE)  of  the  distribution  from  the  empirical 
distribution.  For  this  purpose,  the  whole  range  between  the 
lowest  and  the  highest  value  of  stock  return  will  be  split 
into  some  intervals,^  and  then  the  estimator  of  the  density 
for  each  interval  will  be  obtained  by  applying  Eg  (3.1.2). 
The  difference  in  the  values  of  the  empirical  distribution 
f(yj  and  the  theoretical  distribution  (GED  or  Normal) 
corresponding  to  the  center  of  each  interval  will  be  computed 
and,  then,  the  sum  of  squared  differences  will  be  minimized 
with  respect  to  the  unknown  parameters  S in  the  theoretical 
distributions  h(y^)  . 

^ (3.1.3) 

o i=l 

where  N is  the  number  of  intervals.  In  addition,  the  density 
function  of  the  normal  distribution  to  be  estimated  is  given 
by 


-P(y  I H,  a)  = 


v/2  n o 


;0XP 


_ 1 /y-n\2 
2 


(3.1.4) 


The  density  estimator  obtained  from  this  method  will  be  called 
the  least  square  estimator  using  the  naive  estimation  data. 


^ The  U.S.  stock  return  data  shows  that  the  lowest  and  highest 
values  of  stock  returns  are  -0.28794  and  0.37681,  respectively. 
The  distance  of  each  interval  will  be  0.001  if  the  whole  length 
between  —0.295  and  0.395  is  divided  into  690  intervals. 
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3.1.3.  The  Kernel  Estimator  and  the  Pseudo-data  Maximum 
Likelihood  Method  with  a Kernel 

Let ' s continue  comparing  the  performance  of  two  assumptions 
imposed  on  the  distribution  of  stock  returns  using  the  kernel 
estimator  of  density  which  is  one  of  the  most  commonly  used 
estimators  of  density.  The  kernel  estimator  with  kernel  K is 
given  by 


and  h is  the  smoothing  parameter.  The  kernel  estimator  can  be 
interpreted  as  a sum  of  'bumps'  placed  at  the  observations. 
Two  major  properties  of  kernel  estimates  should  be  noted  here. 
One  is  that  the  estimator  given  by  the  Eq  (3.1.5)  will  be  a 
probability  density  if  the  kernel  K()  is  everywhere  non- 
negative and  satisfies  the  Eq  (3.1.6).  The  other  is  that  it 
will  be  continuous  and  differentiable  if  K()  is  continuous  and 
differentiable.  In  addition,  when  applied  to  data  from  long- 
tailed  distributions,  there  is  a tendency  for  spurious  noise 
to  appear  in  the  tails  of  the  estimates  since  the  window  width 
is  fixed  across  the  entire  sample.  Of  course,  the  kernel 
estimator  depends  on  what  kind  of  kernel  function  is  used, 
what  size  of  window  width  h is  used,  and,  in  turn,  how  the 
optimal  size  of  h is  chosen.  The  first  problem  is  to 


(3.1.5) 


where 


(3.1.6) 
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determine  what  kind  of  kernel  function  should  be  used.  The 
Gaussian  and  GED  kernels  are  summarized  in  Table  (3.1.1).  In 
general,  the  choice  of  kernel  function  is  made  subjectively 
according  to  author's  preference. 

The  problem  is  how  to  determine  the  optimal  magnitude  of 
the  window  width  h.  A solution  to  the  problem  is  to  choose 
the  optimal  h which  minimizes  the  mean  integrated  squared 
error  of  estimated  density  from  the  true  density: 


Table  3.1.1 


Kernel 


Gaussian 


GED 


Kernel  Functions 


Kernel  Function 


KU)  = 


s/2'K 


expl 


1 + 4 

X2  ^ 


% 


A = 


r(i/6) 


2 «r(3/6) 


Min  fE[(f-^)^]dx  (3.1.7) 

where  S is  unknown  parameter  in  the  density  function.  The 
mean  sqaured  error  can  be  rewritten  as  sum  of  sqaure  of  bias 
and  variance.  First,  the  bias  is  given  by 


biast^ix)  = fix)  -E[£(x)]=f{x)  -fj-Kl^^y(y)dy  (3.1.8) 
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In  the  above  equation,  it  is  not  too  hard  to  find  out  that  the 
bias  in  the  estimation  of  f(x)  depends  on  the  window  width  h, 
not  directly  on  the  sample  size,  as  well  as  on  the  kernel  K. 
Suppose  that  the  kernel  k()  is  a symmetric  function  satisfying 

jK(t)dt  = l,  JtK{t)dt  = 0,  jt^K{t)dt  = 0)  (3.1.9) 

and  that  the  unknown  density  f()  is  continuously 
dif fsj^entiable  of  all  orders.  Then  letting  y = x - th  and 
using  a Taylor  series  expansion  on  f (x  - th)  , the  bias  can  be 
rewritten  as 

biaSf^ix)  = hf^ix)  f tK(t)  dt-^h^f"  {x)  f t^K{  t)  dt+- 

(3.1.10) 

= —h^f'^{x)k2+higher-order  terms  of  h 

In  a similar  way,  the  variance  of  the  density  estimate  can  be 
approximated  by 

var£{x)  = - l[f(x)  +bias^(x)]2 

^ -\f(x-ht)K(t)^dt-^  [f(x)  +0(h2)]2 
12/2  J n 

(3.1.11) 

” -iif:f'(x-)  +-]  K(t)^dt  + 0{n~^) 

It  follows  from  the  above  two  equations  that  the  approximate 
mean  integrated  squared  error  is  given  by 


jh^k^ff"(x)^dx  + 


(3.1.12) 
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Then  the  optimal  magnitude  of  h can  be  obtained  by  minimizing 
the  approximate  mean  integrated  squared  error  of  the  density 
estimate  with  respect  to  h as  follows  : 


As  can  be  seen  in  the  Eg  (3.1.13),  this  method  produces 
some  useful  properties  of  the  optimal  h.  First,  h^pt  converges 
to  zero  as  n increases,  but  at  a very  slow  rate.  Second, 
smaller  values  of  h will  be  appropriate  for  more  rapidly 
fluctuating  densities  since  such  densities  have  larger 
absolute  values  of  f ” (x) . 

A critical  drawback  of  this  method,  however,  is  that  the 
true  density  function  is  unknown  at  the  stage  where  the 
density  function  is  being  estimated.  Also  frequent  use  of 
approximation  of  the  objective  function  will  lead  to  loss  of 
information,  in  other  words,  inaccuracy  of  mathematical 
calculation  of  the  ideal  window  width.  And  the  assumption  of 
symmetricity  of  kernel  function  is  also  somewhat  restrictive. 

So  an  alternative  method  to  overcome  the  problem  should  not 
rely  on  the  true  density.  At  this  moment,  maximum  likelihood 
estimation  technique  will  be  very  useful.  The  estimation 
procedures  are  as  follows.  First,  generate  an  arbitrary 
series,  {X'^,  j = 1,2,  --,M),  whose  range  is  sufficiently 
larger  than  that  of  the  sample  data.  Second,  as  in  the 


(3.1.13) 


conventional  maximum  likelihood  estimation,  construct  the  log- 
likelihood  function  using  the  Eg  (3.1.5)  over  the  pseudo-data 
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which  sufficiently  covers  the  range  of  the  sample  data. 

L = In  n £{x^) 

j=i 
M 

i=i 

Third,  taking  the  partial  derivatives  of  the  log-likelihood 
function  with  respect  to  h and  unknown  parameters  S,  we  will 
obtain  the  optimal  value  of  the  window  width  h which  satisfies 


-ln(nh)  + In 


-x^ 


= 0 


Suppose  that  K()  is  a Gaussian  kernel  function,  that  is. 


— i cxo 

if 

2 

\ h ] 

[ 2\  h ] 

Then  denoting  K((x^  - xj/h)  as  and  taking  the  partial 
derivative  of  the  kernel  function  with  respect  to  h. 


dh 


v/2ti 


exp 


if 

(x^  -Xj)  2 

[2!  h ]\ 

}p 

(x^-x^) 


(3.1.17) 


Substituting  the  Eg  (3.1.17)  back  into  the  Eg  (3.1.15),  then 
we  will  obtain  the  optimal  window  width,  h^pt,  as  below  : 
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M n 


11/2 


opt 


^j-l  i=l 


where 


(3.1.18) 


W,j  = 


^ii 


n 

i=l 


ij 


SI 


i=l 


This  will  be  called  the  pseudo-data  maximum  likelihood  method 
with  the  kernel  since  the  kernel  estimator  of  density  is 
estimated  by  applying  the  maximum  likelihood  technique  to  the 
psuedo-data.  The  optimal  window  width  obtained  from  using 
the  maximum  likelihood  estimation  method  has  some  useful 
properties.  First,  unlike  the  minimization  method  of  the  mean 
integrated  squared  error,  the  optimal  h does  not  require  any 
information  concerning  the  true  density  function,  but  relies 
on  the  kernel  function.  This  is  a very  satisfactory  result 
since  it  sounds  contradictory  to  assume  that  the  true  density 
is  known  while  the  true  density  is  under  estimation.  Assume 
that  |x^  - Xii  < 1.  Then  as  the  sample  size  n increases,  given 
X'^,  the  denominator  of  the  Eq  (3.1.18)  expands  faster  than  its 
numerator  does,  since  (x'^-  xj^  < 1.  It  follows  that  the 
optimal  window  width  h will  converge  to  zero  as  the  sample 
size  increases.  This  is  compatible  with  the  result  obtained 
from  the  previous  method.  Third,  the  size  of  the  arbitrary 
series  generated  for  construction  of  likelihood  function  can 
affect  the  optimal  h in  different  directions.  Suppose  that 
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M n 

~ 0{M^) 

j=l  i=l 

If  s is  greater  (or  less)  than  1,  the  ideal  window  width  will 
increase  (or  decrease)  as  the  size  of  an  arbitrary  series  M 
increases.  If  s equals  to  1,  the  optimal  h will  not  be 
affected  by  changes  in  M.  Lastly,  the  optimal  h can  be 
interpreted  as  weighted  average  of  the  distances  between 
observations  of  the  sample  and  points  generated  for 
construction  of  likelihood  function. 

Next,  suppose  that  K()  is  a GED  kernel  function,  that  is. 


X = 


Tjl/b) 


1 

2 


2 

2'^r(3/6) 


(3.1.19) 


Taking  the  partial  derivative  of  the  Eq  (3.1.19)  with  respect 
to  h,  we  will  obtain 


dh 


■ 

6 

1 

2 

X 

(3.1.20) 


Substituting  the  Eq  (3.1.20)  into  the  Eq(3.1.15)  and  solving 
it  over  h,  we  will  get  the  following  solution  : 


■‘opt 


M n 


1 

x^  -x^ 

2 

X 

(3.1.21) 
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where 

i’l 

Some  conclusions  similar  to  the  case  with  a Normal  kernel  can 
be  drawn.  First,  the  optimal  window  width  h does  not  require 
any  information  concerning  the  true  density  function.  Second, 
as  the  sample  size  increases,  given  x-*,  the  optimal  h will 
converge  to  zero  since  jx^  - x^l  is  assumed  to  be  less  than 
one.  Note  the  speed  of  convergence  to  zero  will  depend  upon 
the  value  of  tail-thickness  parameter  5.  Third,  the  optimal 
window  width  obtained  under  a Gaussian  kernel  can  be  regarded 
as  a special  case  of  that  obtained  under  a GED  kernel,  in 
which  6=2:  see  the  Eg  (3.1.18)  and  the  Eg  (3.1.21). 


3.1.4.  The  Nearest  Neighbor  Method 


The  idea  in  the  nearest  neighbor  method  is  to  adapt  the 
degree  of  smoothing  to  the  local  density  of  data.  The  degree 
of  smoothing  is  controlled  by  an  integer  k which  is 
considerably  smaller  than  the  sample  size.  Let  d(t,x^)  denote 
the  distance  between  point  t and  the  i-th  point  of  the  sample, 
Xi  and  assume  that 

d(t,Xi)  < d(t,Xg)  < • ■ • < d(t,x,,) 

Then  the  k-th  nearest  neighbour  density  estimate  is  given  by 


t) 


1 k-1 

nd(t,x^)  2 


(3.1.22) 
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The  k-th  nearest  neighbour  estimate  has  two  critical  drawbacks 
: one  is  that  it  is  not  a smooth  curve  while  the  other  is  that 
it  will  not  itself  be  a probability  density  since  it  does  not 
integrate  to  one.  Thus  the  nearest  neighbour  estimate  may  not 
be  appropriate  if  an  estimate  of  the  entire  density  is 
required. 

It  is  possible  to  generalize  the  k-th  nearest  neighbour 
density  estimate  by  applying  the  kernel  estimator.  The 
generalized  k-th  nearest  neighbour  density  estimate  is  defined 
by 


3.1.5.  The  Variable  Kernel  Method 

The  variable  kernel  estimate  is  obtained  through  an 
application  of  the  kernel  method  to  the  generalized  Kthe 
nearest  neighbour  method  and  the  kernel  method.  The  variable 
kernel  estimate  of  density  with  smoothing  parameter  h is 
defined  by 


where  K()  is  a kernel  function,  k is  a positive  number,  d^ ^ is 
the  distance  from  to  the  Kth  nearest  point  in  the  set 
including  the  other  n-1  data  points. 

Some  important  features  of  the  variable  kernel  method  are  as 


(3.1.23) 


(3.1.24) 
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follows.  First,  the  variable  kernel  estimate  is  a probability 
density  function  if  the  kernel  is.  Second,  like  the  kernel 
estimate,  it  will  preserve  all  the  local  smoothness  properties 
of  the  kernel.  Third,  the  window  width  of  the  kernel  placed 
on  the  point  is  proportional  to  d^  k,  so  that  the  data  point 
in  the  region  where  the  data  are  sparse  will  have  flatter 
kernels.  Fourth,  for  a given  k,  the  degree  of  smoothing 
depends  on  h,  and  the  reponsiveness  of  the  window  width  choice 
to  each  data  point  depends  on  k. 

3.1.6.  Orthogonal  Series  Estimators 

The  orthogonal  series  estimation  method  is  a way  of 
estimating  density  function  f ()  by  estimating  the  coefficients 
of  Fourier  expansion  of  the  data  sequences.  Suppose  that  we 
estimate  a density  function  f()  of  the  sequence  (p^(x)  , which 
is  a function  of  random  variable  x,  over  some  range,  say  [a, 
b]  . And  assume  that  {0^}  is  orthogonal  on  [a,  b]  . Then  the 
vth  Fourier  coefficient  of  f relative  to  {0^}  is  given  by 


The  Eg  (3.1.25)  can  be  rewritten  as 
fv  = E[0^(x)  ] 

so  that  an  unbiased  estimator  of  f^  based  on  {x}  from  f is 
given  by 


t> 


(V  = 1,  2,  3,  • • • ) 


(3.1.25) 
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3.1.7.  Maximum  Penalized  Likelihood  Estimators 

It  is  possible  to  get  better  estimators  of  a density  by 
adjusting  the  conflict  between  smoothness  and  goodness-of-f it 
to  the  data.  A maximum  penalized  likelihood  estimate  is 
obtained  by  incorporating  into  the  likelihood  function  a term 
which  represents  the  roughness  of  a density.  The  penalized 
likelihood  of  a curve  g based  on  observations,  {X^,  i = 1,  2, 

■ ■ ■ , n) , is  given  by 


where  a is  a positive  smoothing  parameter  and  R(g)  measures 
the  roughness  of  g.  Then  the  maximum  penalized  likelihood 
estimator  of  a density  function  g be  evaluated  at  estimated 
parameters  which  maximize  L„(g)  under  the  constraints  that 


As  a result,  it  follows  that  the  maximum  penalized  likelihood 
estimator  will  be  a probability  density. 


N 


=Y^logg{X^)  - aR{g) 


, g(x)  > 0 for  all  x,  and  R(g)  < °o. 
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3.1.8.  General  Weight  Function  Estimator 

The  density  estimators  discussed  above  can  be  expressed  in 
a more  general  form.  That  is,  the  general  weight  function 
estimator  of  a density  will  be  given  by 

i-l 

where 

oo 

J fi/(X^,x)  dx  = 1 and  VI{X^,x)  > 0 for  all  and  x.  The 

— oo 

examples  are  summarized  as  below  : 


Table  3.1.2  Several  Weight  Functions 


Method 

Weight  function 

Histogram 

l/h(x)  if  Xi  and  x fall  in  the  same 
bin 

0 otherwise 

Kernel  Estimator 

(l/h)K((x  - XJ/h) 

Orthogonal  Series 
Estimator 

V=0 
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3 . 2 Comparison  between  Performances  of  GED  and  Normal 

In  this  section,  the  performance  of  the  GED  distribution 
will  be  compared  with  that  of  the  normal  distribution  by  two 
methods  discussed  above,  more  specifically,  the  least  sqaure 
estimator  using  the  naive  estimation  data  and  the  pseudo-data 
maximum  likelihood  estimation  using  the  kernel.  The  reasons 
for  choosing  these  methods  are  that  it  is  relatively  easy  to 
estimate  them  and  that  they  shows  some  desirable  properties  as 
density  estimators. 

3.2.1.  Empirical  Results  of  the  Least  Square  Method  Using  the 
Naive  Estimation  Data 

The  theoretical  distributions  are  estimated  for  3 different 
sizes  of  intervals.  The  results  of  estimation  of  density 
functions  are  summarized  in  Table  (3.2.1)  and  Table  (3.2.2). 
Table  (3.2.1)  shows  that  the  mean  of  stock  returns  is 
approximately  1.1%  while  the  tail-thickness  variable  6 is, 
though  close  to  0,  approximately  0.14.  It  follows  that  the 
distribution  of  stock  returns  will  have  slightly  thicker  tails 
than  the  normal  distribution.  But  it  does  not  make  a big 
difference  between  the  GED  and  the  Normal  since  the  estimates 
of  B are  close  to  0 in  all  cases.  As  can  be  seen  in  Table 
(3.2.3)  and  Figure  (3.2.1),  the  difference  between  the  two 
distributions  is  almost  indistinguishable  except  that  the  GED 
has  slightly  higher  kurtosis  and  thicker  tails  than  the 
normal.  The  relative  advantage  of  the  GED  is  less  than  1%  in 
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terms  of  the  magnitude  of  the  estimates  of  mean  squared  errors 
whereas  it  involves  a big  computational  cost.  Especially,  the 
absolute  value  contained  in  the  GED  density  function  tends  to 
interrupt  optimization  procedures  since  the  sign  of  the 
gradient  vector  becomes  very  unstable  due  to  it.  Taking  into 
account  both  the  relative  advantage  and  the  computational 
costs  when  the  GED  distribution  rather  than  a Gaussian 
distribution  is  used,  it  is  very  hard  to  say  that  the  GED 
distribution  is  much  more  appropriate  as  the  distribution  of 
the  monthly  stock  return  indexes  of  the  U.S.A.  than  a Gaussian 
distribution.  Another  reason  why  I decide  to  employ  the  GED 
as  the  stock  return  distribution  is  that  unlike  a normal 
distribution  a linear  combination  of  the  series  which  are  GED- 
distributed  is  not,  in  general,  GED-distributed.  So  it  will 
be  very  much  complicated  to  derive  a Kalman  filter  using  the 
GED  distribution. 


Table  3.2.1  Results  of  Estimation  of  GED 


Interval 

0 

6 

M 

0.0001 

0.03671 

(0.00030) 

0.13903 

(0.01245) 

0.01075 

(0.00015) 

0.0005 

0.03673 

(0.00066) 

0.13856 

(0.02785) 

0.01114 

(0.00034) 

0.001 

0.03678 

(0.00093) 

0.13651 

(0.03931) 

0.01163 

(0.00048) 

( ) : 

standard  error 

of  coefficient 

estimate 
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Table  3.2.2  Results  of  Estimation  of  Normal  Distribution 


Interval 

a 

0.0001 

0.03975 

0.01093 

(0.00013) 

(0.00015) 

0.0005 

0.03976 

0.01132 

(0.00028) 

(0.00033) 

0.001 

0.03976 

0.01181 

(0.00040) 

(0.00047) 

( ) 

: standard  error 

of  coefficient  estimate 

Table  3.2.3  Comparison  between  Estimated  1 

Mean  Sguared  Errors 

Interval 

MSEl (NORMAL) 

MSE2 (GED) 

MSE2/MSE1 

0.0001 

8.88284 

8.87300 

0.99889 

0.0005 

1.99013 

1.98035 

0.99509 

0.001 

1.05450 

1.04599 

0.99099 

3.2.2.  Empirical  Results  of  Estimation  of  a Density  Function 
Using  the  Pseudo-data  MLE  Method  with  the  Kernel 

The  optimal  window  width  when  we  use  a Gaussian  Kernel 
estimating  the  density  function  of  stock  returns  by  a MLE 
method  are  summarized  in  Table  (3.2.4).  When  the  range 
covering  sufficiently  the  maximum  and  the  minimum  of  stock 
returns  is  divided  into  100  points,  the  optimal  window  width 
is  0.28682  and  is  significantly  different  from  zero  at  5 
percents  of  sinificance  level.  The  same  optimization  is 
applied  to  different  number  of  pseudo-sample  points.  Table 
(3.2.4)  shows  that  there  appears  to  exist  a globally  optimal 
window  width  around  120  of  pseudo  sample  points  since  the 
value  of  an  average  likelihood  attains  the  maximum  at  M equal 
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to  120. 


Table  3.2.4  Optimal  Window  Width  with  Normal  Kernel 


M 

Window  Width 

Standard  Error 

Average  Likelihood 

100 

0.28682 

0.02088 

-0.184729 

120 

0.28342 

0.01885 

-0.173143 

150 

0.28352 

0.01686 

-0.173484 

200 

0.28362 

0.01461 

-0.173830 

300 

0.28372 

0.01193 

-0.174181 

500 

0.28381 

0.00924 

-0.174466 

On  the  other  hand,  the  optimal  tail-thickness  variables  S 
and  window  width  h^pt,  with  a GED  kernel  function  are  estimated 
using  a Gauss  program.  Originally,  the  estimate  of  S was 
expected  to  be  within  the  range  of  2.0  ± 1.0.  When  the 
estimation  is  performed  over  100  different  points  as  in  a 
Gaussian  kernel  case,  however,  the  estimate  of  S turns  out  to 
be  egual  to  228.67104  with  a standard  error  of  2094.89694. 
This  implies  that  it  is  statistically  insignificant. 
Probably,  such  a result  arises  from  the  fact  that  unlike  a 
Gaussian  distribution  case,  a linear  combination  of  the  GED 
distributions  does  not  necessarily  lead  to  a new  GED 
distribution.  But  it  gives  the  estimate  of  optimal  window 
width  which  is  very  close  to  that  obtained  under  the  Gaussian 
kernel.  That  is,  the  estimate  of  h^pt  is  equal  to  0.28691  with 
a standard  error  of  0.00836  so  it  is  statistically 
significant.  It  implies  that  there  is  no  big  difference 
between  the  estimates  of  the  optimal  window  width  obtained 
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under  both  the  Gaussian  and  GED  kernels.  The  results  of 
estimation  using  the  GED  kernel  function  are  summarized  in 
Table  (3.2.5). 

Of  course,  we  attempted  estimation  in  these  cases  where  M 
is  greater  than  100,  but  the  programs  for  the  estimation  of 
the  optimal  window  width  and  tail-thickness  parameters  never 
converged. 


Table  3.2.5  Optimal  Window  Width  with  GED  (M  = 100) 


Parameter 

Estimate 

S.E. 

Ratio 

Prob-value 

Tail-thickness 
Window  Width 

228.67104 

0.28691 

2094.89694 

0.00836 

0.10916 

34.30180 

0.91308 

0.00000 
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the  monthly  stock  return  indexes  of  the  U.S.A. 


Figure  3.2.1  Comparison  of  GED  with  Normal  Distribution 


CHAPTER  4 

DYNAMIC  SPECIFICATION  OF  A MODEL  USING  THE  KALMAN  FILTER 
WITH  ARCH  DISTURBANCES 

According  to  the  theory  in  finance,  the  stock  price  in 
equilibrium  is  equal  to  the  sum  of  discounted  future  dividends 
which  will  be  paid  to  its  holders.  That  is, 

^ DIVIDEND^ 

~ ^ (1  + dj  -(1  + «,) 

where  5t  is  a discount  factor  at  time  t.  Dividend  payments  at 
time  t will  be  made  based  on  the  net  cash  flows  a firm  obtains 
during  a given  time  period  and  the  profitability  of  its 
business  in  the  future.  The  net  cash  flows  which  the  firm 
obtains  during  the  period  will  be  largely  determined  by  the 
prices  and  quantities  of  the  goods  it  produces,  labor  costs 
and  capital  costs.  Thus  changes  in  the  prices  and  quantities 
of  its  products  and  inputs  and  the  profitability  of  its  future 
business  will  lead  to  changes  in  the  stock  prices  of  the  firm. 

On  the  other  hand,  suppose  that  a specific  firm  with  very 
good  performance  in  its  business  issues  corporate  bonds  with 
high  yields.  Then  the  bond  issuing  will  play  a role  as  a 
signal  that  the  firm  is  planning  very  good  projects  and  have 
a positive  effect  on  the  stock  prices  of  the  firm.  The 
difference  between  yield  on  the  Moody's  Aa  corporate  bond  and 
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the  short-term  interest  rate  reflects,  to  some  extent,  the 
business  conditions  of  the  whole  economy.  Thus  ceteris 
paribus,  stock  returns  are  determined  by  the  term  spread,  the 
inflation  rates  and  the  real  activity  such  as  the  growth  rates 
of  GNP  and  industrial  production.  Let  us  assume  that  the 
stock  return  is  a linear  function  of  the  term  spread,  the 
inflation  rate  and  the  growth  rate  of  industrial  production. 

The  economy  is  a dynamic  system  which  adjusts  itself 
incessantly  in  response  to  various  impacts  on  it  such  as  oil 
shocks,  strikes,  changes  in  the  political  system,  changes  in 
the  tax  regime,  wars,  changes  in  financial  environment, 
innovations  in  technology  and  management,  and  so  on.  In  its 
adjustment  procedures,  relative  power  between  several  economic 
forces  changes  over  time.  In  a specific  model,  such  relations 
between  the  economic  forces  can  be  accounted  for  by  the 
magnitudes  of  coefficients  of  the  model.  Thus  it  seems  to  be 
more  plausible  to  assume  that  the  coefficients  vary  over  time 
than  to  assume  that  they  remain  unchanged.  For  example,  the 
oil  shocks  during  the  1970s  caused  the  prices  of  petroleum 
products,  in  turn,  those  of  petrochemicals  and  firms'  fuel 
costs  to  sharply  rise.  On  the  one  hand,  consumers  tried  to 
reduce  their  consumption  of  these  products  or  to  divert  their 
expenditures  to  other  products.  On  the  other  hand,  firms  also 
tried  to  divert  economic  resources  to  other  sectors  which  were 
less  dependent  on  petroleum,  or  to  invent  more  efficient 
production  systems.  All  of  there  factors  can  affect  the 
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cashflows  and  of  course  the  stock  prices  of  the  firms.  But  in 
general  it  will  take  a considerably  long  time  for  consumers 
and  producers  to  fully  adjust  themselves  to  the  new 
environment  because  of  the  difficulty  in  changing  consumers' 
preferences,  restricted  budgets,  loss  associated  with 
diversion  of  investment,  or  inefficient  bureaucratic 
behaviors.  Also  such  changes  as  described  above  occur 
continuously  in  many  sectors  and  give  different  impacts  to  the 
whole  economy.  Thus  it  seems  to  be  more  sensible  to  assume 
that  from  the  point  of  the  whole  economy  the 
interrelationships  between  economic  factors  changes  gradually 
over  time  rather  than  abruptly,  a smoothly  varying  parameter 
model  can  be  regarded  as  a general  form  of  discretely  varying 
parameter  model . 

In  this  chapter,  by  explicitly  taking  account  of  the  time- 
varying  relationships  between  explanatory  variables,  I will 
construct  a more  dynamic  model  of  stock  returns. 

4 . 1 The  State  Space  Model 

The  dynamic  model  described  above  can  be  expressed  in  the 
form  of  a state  space  model . The  state  space  model  is 
composed  of  two  important  eguations,  a measurement  equation 
and  a transition  equation. 
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4.1.1.  A Measurement  Ecmation 

Yt  = + dt  + t = 1,2,  --,T  (4.1.1a) 

where  is  a Nxl  multivariate  time  series  vector  of  stock 
returns  for  holding  stocks  from  time  t-1  to  t,  Xt  is  an  Nxm 
matrix  of  explanatory  variables,  fit  is  a mxl  vector,  known  as 
the  state  vector,  and  d^  is  an  Nxl  vector.  And  assume  that  €t 
is  an  Nxl  vector  of  serially  uncorrelated  disturbances  with 
conditional  mean  zero  and  conditional  covariance  matrix  Ht, 
that  is, 

E[£t]  = 0 Var[etl  = Ht  (4.1.1b) 

where  Ht  = H(et-i,  et-2,  ’ • ‘ , €t-p) 

As  the  first  attempt,  consider  a simple  case  where  yt  is 
a univariate  series  with  error  terms  following  an  ARCH(l) 
process  and  dt  is  ignored.  The  variance  of  error  conditional 
on  the  past  errors,  is  determined  by  a linear  combination  of 
constant  and  squared  error  observed  at  time  t-1. 

Yt  = XtBt  + et  (4.1.2a) 

6t  = ht^^^Zt  (4.1.2b) 

(Zt  ,-00  < t < 00}  - iid  N[0,  1]  (4.1.2c) 

ht  = q + r*£t-i^  (4.1. 2d) 

where  Xt  is  the  Ixm  vector  of  explanatory  variables  while  6t 
is  the  mxl  vector  of  coefficients.  If  Cfi  is  fixed  and  known 
at  time  t-1,  the  distribution  of  et  conditional  on  past 
observations  will  be  normal  with  mean  zero  and  variance  ht. 
However,  such  a case  rarely  happens  in  empirical  studies. 
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Thus  it  should  be  noted  that  the  distribution  of  e^. 
conditional  on  past  observations  is  symmetric  but  it  is  not, 
in  general,  normal  since  knowledge  of  past  observations  does 
not  necessarily  imply  knowledge  of  past  disturbances.  For 
simplicity,  we  will  assume  that  the  distribution  of 
conditional  on  past  observations  is  normal  with  mean  zero  and 
variance  Et-i[e\].  Let  denote  a set  of  information  on 

observations  up  to  and  including  y^i,  i.e,  Jlfi  = {yt-i/  Yt- 
2/ ■ ■ ■ fYi)*  And  let  bfi  and  denote  the  optimal  estimates  of 
Bfi  and  the  mxm  covariance  matrix  of  bfi  conditional  on 
respectively.  Taking  the  expectation  conditional  on 
both  sides  of  the  Eg  (4.1. 2d), 

®t-i [^t]  — g+r*  Et-i [ ]•  (4.1.3) 

^t-i  = Yt-i  “ Xt-iBfc-i  from  the  Eg  (4.1.2a) 


Substituting  the  Eg  (4.1.4b)  into  the  Eg  (4.1.3)  gives  the 
variance  of  conditional  on  the  information  set  available  at 
time  t-1. 


(Yt-i  ^t-i  (^t-i  ^t-i) 

t-i]  ~ ® t-1  ' 


(4.1.4b) 


(4.1.4a) 


Et-i[e\]  = g + r * (eVi  + ' ) 


(4.1.3) ' 
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where 


4.1.2.  A Transition  Ecruation 

The  time-varying  relationships  between  the  stock  returns 
and  the  explanatory  variables  implies  that  the  state  vector  6^ 
changes  over  time.  Let's  assume  that  Bt  is  generated  by  a 
first-order  Markov  process, 

= MtBt-i  + Ct  + RtUt,  t = 1,2,  --,T  (4.1.5a) 

where  is  an  mxm  matrix,  Ct  is  an  mxl  vector,  is  an  mxg 
matrix  and  u^  is  a gxl  vector  of  serially  uncorrelated 
disturbances  with  mean  zero  and  covariance  matrix  Q^.  Assume 
that  the  covariance  matrix  of  u^  conditional  on  the  past 
covariances  show  an  ARCH(l)  process  as  follows 

Ut  = (4.1.5b) 

(Wt,  -00  < t < 00}  ~ NID  (0,  1)  (4.1.5c) 

Qt  = C + D .*  Ut-iUfi'  (4.1.5d) 

where  C and  D are  the  mxm  symmetric  matrices,  respectively^, 
and  the  notation  of  . * stands  for  element  by  element 
multiplication  of  two  matrices.  Consider  a simple  case  where 
the  Mt  matrix  remains  unchanged  over  time,  c^  is  ignored,  and 
Ut  is  seemingly  uncorrelated  with  each  other  so  that  Rt  is  an 
identity  matrix.  Eg  (4. 1.2. a)  will  be  substituted  with  the 


Out 

0x2 1 ■ 

Oxmt 

^11 

<^12 

• Cxm 

dxx 

di2 

■ dxJ 

II 

Oi 

Ozit 

0z2t 

02mt 

= 

^21 

CM 

0^  ; 

■ 

a 

+ 

^21 

d.22 

PmJt 

Oazt 

^mmt 

^ml 

^m2 

‘^m2 

Qijt  = Cov(Uit,  Ujt) 

= Cij  + dij*Ui_t-lUj,t-l 
= Qjit,  for  i, j = 1,2,  • • • ,m 
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following  expression  : 

6t  = MBfi  + Ut,  t = 1,2,  • • • ,T  (4.1.5a)  ' 

As  is  in  the  previous  case,  if  Ufi  is  fixed  and  known  at 
time  t,  then  the  distribution  of  u^  conditional  on  past 
observations  will  be  normal  with  mean  zero  and  variance- 
covariance  matrix  Qt.  Otherwise,  it  is  not,  in  general, 
normally  distributed.  But  for  simplicity,  it  will  be  assumed 
to  be  normally  distributed  with  mean  zero  and  variance  Et- 

l[UtUt'  ] . 

Et-i[UtUt']  = C + D .*  Et.i[Ut.iUt-i' ] (4.1.6a) 

+ d^j  Efi  [u^  ' ] (4.1.6b) 

Ut-i  = “ MBt-2  from  the  Eg  (4.1.5a)  ' 


= (Bt-i  - bt-i)  + (bt-i  - Mbt-2|t-i)  + M(bt-2|t-i  - V2) 
where  bt-2|t-i  implies  the  optimal  estimator  of  64.-2  obtained 
using  the  information  available  at  time  t-1.  The  conditional 
variance  of  u^-i  is  given  by 

Et-i[Ut-iUt-i' ] = Pfi  + Vt-iVt-i'-  - MPt_2,t-i|t-i 

+ MPt-2|t-iM'  (4.1.7) 


where  v^.^  = b^-i  - Mbt-2|t-i 


Pt-2,t-i|t-i  - Et-i[  (bt-2|t-i  - ftt-2)  (bt-i  - 6t-i)  ' ] and 
Pt-2|t-l  ~ ^t-1  [ (bt-2|t-l  “ ^t-z)  (bt-2|t-l  ~ ^t-z)  ']• 

Substituting  the  Eg  (4.1.7)  into  the  Eg  (4.1.6)  produces  the 
variance  of  Ut  conditional  on  0^-1  as  below, 

= C + D .*  [P,_i  + Vt-iVt-i'-  Pt-2,t-i|t-iM' 


- MPt-2,t-i|t-i  + MPt-2|t-iM'] 


(4.1.7) • 
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From  now  on,  let  h\  and  Q\  denote  Et_i[e\]  and  EfiEUtUt']  for 
notational  convenience,  respectively. 

As  can  be  seen  in  the  Eq  (4.1.6),  the  ij-th  element  of  the 
variance-covariance  matrix  of  based  on  the  observations 
including  and  up  to  time  t-1  is  linearly  determined  by  the  ij- 
th  element  of  cross  product  of  Ut_i  conditional  on  the  same 
information  set.  This  model  will  be  called  as  a matrix 
autoregressive  conditional  heteroscedasticity  (MARCH) . This 
can  be  regarded  as  a more  general  form  of  the  standard  ARCH 
model,  in  which  only  the  variance  terms  of  disturbances  are 
assumed  to  follow  ARCH  processes,  since  the  MARCH  model 
enables  the  covariance  terms  to  be  utilized  in  the  procedures 
of  estimation  of  unknown  parameters.  Therefore  we  can  get 
better  estimators  than  otherwise  since  the  covariance  terms, 
which  will  be  ignored  in  a standard  ARCH  model,  carry  some 
additional  inforamtion.  It  follows  that  the  difference, 
Qt°(ARCH)  - Qt“ (MARCH)  , will  be  positive  definite. 

Finally,  it  should  be  stressed  that  although  the  Ct's  are 
not  mutually  independent  of  each  other,  they  are  serially 
uncorrelated,  so  are  the  Ut's  even  if,  like  the  e^'s,  they  are 
also  not  independent  of  each  other. 

In  order  to  derive  the  Kalman  filter,  we  still  need  more 
assumptions  on  the  initial  state  vector  and  the  relations 
between  the  disturbance  terms  in  both  the  measurement  and 
transition  equations  and  the  initial  state  vector.  First,  the 
initial  state  vector,  6q,  has  a mean  of  bg  and  a covariance  of 
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Pq,  that  is, 

E(Bo)  = t>o  and  Var(Bo)  = Po-  (4.1.8a) 

Second,  the  disturbances  Ut  and  6t  are  uncorrelated  with 
each  other  in  all  time  periods,  and  uncorrelated  with  the 
initial  state,  that  is, 

E(Ut£s')  = 0 for  all  t,s  = 1,2,  --,T 

and  E (Ut^o ' ) = 0 , E ( ' ) = 0 for  t = 1,2,  --,T  (4.1.8b) 

4.2  Derivation  of  the  Kalman  Filter 

The  Kalman  filter  provides  a good  way  of  computing  the 
optimal  estimates  of  the  state  vector,  by  applying  a 

recursive  procedure  to  the  state  space  form,  based  on  the 
information  available  at  time  t. 

For  the  time  being,  let  us  assume  that  €t  and  Ut  are  normally 
distributed  with  means  and  variances  described  in  Eg  (4.1.1) 
and  Eg  (4.1.2).  The  Kalman  filter  consists  of  two  crucial 
eguation  groups,  the  prediction  eguations  and  the  updating 
eguations.  Let  bt|t-i  denote  the  mean  of  the  distribution  of 
conditional  on  the  information  available  at  time  t-1  and  Pt|t-i 
the  mxm  covariance  matrix  of  the  estimate  of  B conditional  on 
the  information  available  on  time  t-1.  Given  the  assumption 
of  normality  of  6t  and  Ut  as  well  as  the  information  available 
on  time  t-1,  is  normally  distributed  with  a mean  of  bt|t-i 
and  a covariance  matrix  of  Pt|t-i*  Taking  the  conditional 
expectation  on  the  both  sides  of  Eg  (4.1.2a) ',  we  will  obtain 
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bt|t-i  = Mbfi,  t = 1,2,-  -,T  (4.2.1a) 

where  bfi  is  assumed  to  be  known  at  time  t-1.  From  the  Eq 
(4.1.5a)'  and  the  Eq  (4.2.1a), 

So,  the  variance  of  6^  conditional  on  n^-i  is  given  by 
Pt|t-i  = Et-i[(6t  - bt|t-i)  (fit  - bt|t-i)  '] 

= MPt-iM'  + Q\,  t = 1,2,  --,T  (4.2.1b) 

where  = E[(6t-i  - bt-i)  (fit-i  - bt-i)  ']  is  assumed  to  be  known 

at  time  t.  The  above  two  equations,  Eq  (4.2.2a)  and  Eq 
(4.2.2b),  are  called  as  the  prediction  equations. 

As  the  information  about  time  t becomes  available,  the 
estimate  of  R can  be  updated.  It  should  be  noted  that  = 
{Ytf  Yt-i/  ' ' , Yi)  so  that  the  distribution  of  R^  conditional 
on  fit  is  identical  to  that  conditional  on  y^.  In  order  to  get 
the  optimal  estimator  of  6^  based  on  y,.,  we  first  need  to  find 
out  the  multivariate  joint  distribution  of  fi^  and  yt.  Under 
the  assumption  that  and  u^  conditional  on  are  normally 
^distributed,  6^  conditional  on  is  normally  distributed 

with  mean  bt|t-i  and  covariance  matrix  Pt|t-i.  The  normality 
assumption  also  leads  y^  = x^fit  + conditional  on  to  be 
normally  distributed  with  mean  and  variance  as  below. 

Et-i[yt]  = Et-i[Xtfit  + uj 

= ^tbt|t-i  (4.2.2a) 

The  prediction  error  s^  based  on 
St  = Yt  - Xtbt|t-i 


is  given  by 
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= XtM[6t-i  - bt_i]  + 6t  + XtUt 

So,  the  conditional  variance  of  is  defined  as 

Vart-i[Yt]  = Et-i[StSt'] 

= xJMPt-iM'  + Q\]Xt'  + h\ 

= XtPtlt-iXt'  + h\  (4.2.2b) 

The  covariance  matrix  between  6^  and  conditional  on  is 
- bt|t-i)St']  = Ptlt-iXt'  (4.2.2c) 

From  the  Eq  (4.2.1)  and  the  Eq  (4.2.2),  we  can  deduce  that  the 
joint  distribution  of  and  conditional  on  is  normal 

with  mean  and  covariance  matrix  as  below. 


Pt 

- A 

/ 

/ 

\ 

c|t-l 


Pc\t-A 


(4. 2. 2d) 


Applying  the  well  known  lemma^  in  statistics  to  Eq  (4.2.2) 
produces  the  optimal  estimator  b^  of  and  its  covariance 
matrix  Pt  conditional  on  n^_  as  below. 

= bt|t-i  + Pt|t-iXt'ft'^(Yt  - Xtbt|t-i)  (4.2.3a) 

and  Pt  = Ptit-i  - Pt|t-iXt'ft'^XtPt|t-i  (4.2.3b) 

where  ft  = XtPt|t-iXt'  + h\,  t = 1,2,  --,T  (4.2.3c) 

The  Eq  (4.2.3a),  the  Eq  (4.2.3b),  and  the  Eq(4.2.3c)  are 


Lemma  : Suppose  that  the  pair  of  vectors  x and  y is  jointly  multivariate 

normal  such  that 


X 

— w 

fE  E 11 

^-^xx  x_^xy 

y. 

iV 

/ 

E E 

\^yx  ^yy  }\ 

Then  the  distribution  of  x conditional  on  y has  a multinormal  distribution  with 
a mean  of  /i^ly  = Mx  + 2xySyy'^(y  - Hy)  and  a covariance  matrix  of  S^xly  = 


- ^xy^ 
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called  as  the  updating  equations. 

Taking  the  prediction  equations  and  the  updating  equations 
together  makes  a unified  system  of  equations,  called  as  the 
Riccati  Equations. 

bt+i|t  = + Pt|t-iXt'ft'^(Yt  - Xtbt|t-i)]  (4.2.4a) 

and  - Pt|t-iXt'ft''xtPt|t-i)M'  + (4.2.4b) 

Using  the  Riccati  equations,  we  can  update  the  estimator  of  6^ 
as  new  information  about  observations  is  added  to  the 
information  set. 

On  the  other  hand,  the  evaluation  of  the  prediction, 
updating,  and  Riccati  equations  requires  both  Pt-2,t-i|t-i  Pf 
2|t-i  computed.  As  can  be  seen  above,  the  covariance 

between  Bfi  6^-2  conditional  on  the  information  available 

at  time  t-1  is  given  by 

^t-2,t-l|t-l  ~ ^t-l[  (^t-2|t-l  “ ^t-2)  (^t-i  ~ ®t-i)  ' ] 

= (Xt-2 ' Xt-2)  '^Xt-2 ' et-2et-i ' Xt-i  (Xt-i ' x^-i)  '^.(4 . 2 . 5a) 

-proof- 

Yt-i  = Xt-iJ5t-i  + £t-i  from  the  Eg  (4.1.1a)  (4.2.5b) 

Et-i(Yt-i)  = Xt-ibfi  (4.2.5c) 

by  taking  the  conditional  expectation.  Subtracting  the  Eg 
(4.2.5c)  from  the  Eg  (4.2.5b), 

Yt-i  - Et-i(yt-i)  = Xt-i(6t-i  - bt-i)  + Cfi  (4.2.5d) 

Multiplying  both  sides  of  the  Eg  (4.2.5d)  by  Xfi'  and  then  (x^ 

I'Xfi) 

^t-i  - bt-i  = (Xt-i'Xt-i)'^.i' [yt-i  - Et-i(yt-i)  " ^t-i]  (4.2.5e) 

In  the  similar  way,  we  can  get 
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^t-2  - bt-2|t-l  = (Xt-2'Xt-2)''Xt-2'  [yt-2  - Et-i(Yt-2)  ~ 6^-214. 2. 5f) 
Taking  the  conditional  expectation  on  the  product  of  the  Eq 
(4.2.5e)  and  the  Eq  (4.2.5f)  gives  the  result  described  at  the 
Eg  (4.2.5a)  since  ( 6^-26^! ' ) = E^-i  ( £^-26^-1 ' ) = 0.  -Q.E.D- 

Next,  it  follows  from  the  smoothing  that  the  variance- 
covariance  matrix  of  Rt-z  conditional  on  the  information 
available  at  time  t-1  is  given  by 

Pt-2|t-i  = Pt-2  + Pt-2*(Pfi  - Pfilt-2)  Pt-2*'  (4.2.5g) 

where  = Pt-2M'Pt-i|t-2‘'* 

On  the  other  hand,  smoothing  algorithm  provides  a recursive 
procedure  which  yields  the  estimates  of  coefficients  of  the 
past  periods  using  a cumulative  set  of  the  information  which 
is  already  known  to  analysts.  The  fixed-interval  smoothing 
estimates,  proposed  by  Ansley  and  Kohn  (1982) , are  given  by 
bt|T  = + Pt*(bt+i|T  - Mbt)  (4.2.5h) 

and  Pt|x  = Pt  + Pt*(Pt+i|T  - Pt+i|t)Pt*'  (4.2.5i) 

where  P,*  = P,M'P,,,|,-i,  t = T-1,  T-2 , . . . . , i (4.2.5j) 

— Noteworthy  Features  of  the  Estimators  Obtained 
Using  the  Kalman  Filter 

Under  the  assumption  that  and  u^  conditional  on  Ut-i  ^re 
normally  distributed,  the  Kalman  filtering  leads  the 
estimators  of  unknown  parameters  to  have  some  desirable 
properties.  (i)  The  Kalman  filter  produces  the  mean  b^  of  the 
conditional  distribution  of  as  an  optimal  estimate  of  6^  in 
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the  sense  that  it  minimizes  the  mean  squared  error  (Known  as 
the  minimum  mean  square  estimator)  : 

bt  = Et(6t)  = E(6jyJ  (4.3.1) 

As  can  be  seen  in  the  Eq  (4.2.3a),  the  estimator  bt  relies  on 
the  unknown  parameters  to  be  estimated  in  the  next  section  and 
the  mean  and  covariance  of  the  initial  state  vector  Bq. 
Furthermore,  the  parameter  estimators  obtained  from  minimizing 
the  sum  of  squared  prediction  error,  i.e.,  St^t^/  are 
equivalent  to  those  obtained  from  maximizing  the  conditional 
log-likelihood  function  provided  that  the  Kalman  filter 
converges.  The  next  question  then  is  when  the  Kalman 

filter  converges  to  a steady  state  exponentially  fast.  When 
the  transition  equation  is  time  invariant,  the  model  should  be 
observable  and  controllable  as  well  as  the  characteristic 
roots  of  the  transition  matrix  should  be  less  than  one  in 
absolute  value.  When  both  measurement  and  transition 
equations  are  time  invariant,  the  model  should  be  detectable 
and  stabilizable. 

The  final  question  is  what  the  controllability, 
observability,  detectability,  and  stabilizability  of  the  model 
imply.  Consider  a simple  model  such  that 
yt  = xBt,  fit  = MBt-i  + Ut 

Then  the  model  is  controllable  if  Rank[I,  M,  = t, 

observable  if  Rank[x',  M'x',  •••,  (M')^'^x']  = r,  stabilizable 
if  3 W^  s.t.  Ui(M  + W')|  < 1 for  i = l,---,m,  and 
detectable  if  3 s.t.  Ui(M  - Tx)|  <1  for  i = l,---,m. 
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So  if  the  model  satisfies  the  above  conditions,  the 
estimators  of  the  unknown  parameters  obtained  from  the 
minimization  of  the  sum  of  squared  prediction  errors  will  be 
those  obtained  from  maximizing  the  conditional  log-likelihood. 
It  follows  that  bt  based  on  the  MLE  estimators  of  unknown 
parameters  is  equivalent  to  those  based  the  MMSE  estimators. 

(ii)  The  estimator,  bt,  is  unconditionally  unbiased  since 
bt  - E[bJ  = bt  - E[Et(Bt)] 

= bt  - E(J3t) 

= 0 

(iii)  The  Kalman  filter  also  yields  the  unconditional 
covariance  matrix  of  the  estimation  error, 

Pt  = Et[(6t  - Et(6t))(6t  - Et(6t))']  (4.3.2) 

(iv)  The  conditional  mean  of  yt  based  on  the  information 
available  at  time  t-1,  yt|t-i  = Xtbt|t-i,  will  be  the  MMSE  of  yt. 

(v)  The  distribution  of  the  prediction  error,  St  = yt  - yt|t-i 
XtM(Bt-i  - bfi)  + et  + XtUt,  conditional  on  is  normal  with 

mean  and  covariance  matrix  ft  as  below  and  it  also  is 
independent  of  each  other. 

f-t-i(St)  = XtMEt-i(6t-i  “ hfi)  + Et-i(et)  + XtEt-i(Ut) 

= 0 

Vart-i(St)  = XtPt|t-iXt'  + h\  = 

In  other  words, 

~ NID(0,  ft)  (4.3.3) 

Next,  let  us  speculate  a non-Gaussian  model  in  which 
normality  assumption  on  the  disturbance  terms  is  dropped,  (i) 
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However,  it  still  produces  an  optimal  estimator  in  the  sense 
that  it  minimizes  the  MSB  within  the  class  of  all  linear 
estimates  (known  as  the  minimum  mean  square  linear  estimator) , 
even  though  the  optimal  estimate  of  be  the 

conditional  mean  of  the  state  vector.  (ii)  bt  is  also 
unconditionally  unbiased.  (iii)  The  unconditional  covariance 
matrix  of  the  estimation  error  is  the  given  by  the  Kalman 
filter.  (iv)  The  conditional  mean  of  Yt  time  t-1,  yt|t-i> 
will  be  the  MMSLE  rather  than  the  MMSE  of  y^.  (v)  The 

prediction  error,  s^,  is  distributed  with  mean  zero  and 
covariance  matrix,  ft,  and  uncorrelated  with  that  in  different 
time  periods. 


4.4  Estimation  of  the  Model 

4.4.1.  Constructing  the  Objective  Function 

In  spite  of  some  desirable  properties  which  a maximum 
likelihood  estimators  (MLE)  have,  I will  estimate  our  model  by 
a non-linear  least  square  method  (NLLS)  because  of  following 
reasons.  First,  the  MLE  method  involves  high  computational 
costs  than  the  NLLS  method.  Second,  parameter  estimators 
obtained  from  the  NLLS  method  are  equivalent  to  those  obtained 
from  the  MLE  method  if  the  Kalman  filter  converges. 

The  next  thing  to  do  is  to  construct  the  objective  function. 
In  our  model,  the  objective  function  to  be  minimized  will  be 
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the  sum  of  the  squared  prediction  errors,  i.e., 


ssEj.{e)  = Y,  ivt  - yt\t-i)^ 

C-l 

= E 


(4.4.1a) 


For  testing,  we  need  one  additional  assumption  on  the 
distribution  of  disturbance  term  conditional  on  the 
information  available  at  time  t-1.  In  fact,  it  is  not  likely 
that  the  distributions  of  disturbance  terms  in  the  state  space 
model  are  independent  and  normal  without  precise  knowledge  of 
past  disturbances.  So  it  will  cause  the  procedure  of  the 
hypothesis  testing  to  be  very  much  complicated.  In  order  to 
avoid  such  a problem  as  above,  we  will  assume  that  the 
disturbance  terms  conditional  on  the  information  available  at 
time  t-1  are  normally  distributed.  It  follows  that  the 
prediction  error  St  is  independently  and  normally  distributed. 

Suppose  that  it  is  known  that  the  initial  state  vector, 
has  a proper  prior  distribution  with  known  mean,  bo,  and 
bounded  covariance  matrix,  Pq.  Then  the  Kalman  filter 
produces  the  exact  likelihood  function  of  the  observations,  y. 
In  practice,  proper  prior  distribution  is  hardly  available. 
In  this  case,  the  initial  distribution  of  6q  must  be  specified 
in  terms  of  a diffuse  or  non-informative  prior. ^ In  this 


^ Several  algorithms  have  been  proposed  for  obtaining  appropriate  prior 
distribution.  Rosenberger  (1973)  and  Weaker  and  Ansley(1983)  proposed  algorithms 
in  which  the  elements  of  the  initial  state  vector  are  treated  as  fixed  so  that 
they  need  to  be  estimated  with  either  the  maximtom  likelihood  estimation  technique 
or  the  GLS  technique.  Harvey  and  Phillips  (1979),  Burridge  and  Wallis  (1985), 
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paper,  however,  I am  going  to  use  the  OLS  estimates  and  their 
covariance  matrix,  obtained  from  the  first  60  observations,  as 
the  conditional  mean  and  covariance  matrix  of  the  initial 
state  vector,  respectively. 

4.4.2.  Derivatives  of  the  Objective  Function 


Taking  the  partial  derivative  of  the  sum  of  the  squared 
prediction  errors  stated  in  (4.4.1a)  with  respect  to  the  ith 
element  of  unknown  parameters,  0, 


dssEj.{e) 

30— 


T 


= 2E 

C-1 


where 


ds^ 


- 


db, 


t|t-i 


ddi 


(4.4.2a) 


To  evaluate  the  Eg  (4.4.2a),  we  need  to  compute  the 
following  partial  derivatives: 


30^ 


dM  . 


+ M- 


ddi 


(4.4.2b) 


and  Burmeister  et  al  (1986)  proposed  the  so-called  'big  k'  method  in  which  the 
Kalman  filter  is  initialized  with  a state  covariance  matrix  of  the  form  kl . The 
big  k can  reflect  a high  degree  of  uncertainty  about  the  initial  state,  but  it 
is  numerically  inaccurate.  Ansley  and  Kohn  (1985)  proposed  a general  algorithm 
which  employs  data  transformation  to  eliminate  the  dependence  on  the  initial 
conditions.  de  Jong  (1988,1989)  presumed  that  the  initial  state  vector  are 
composed  of  two  components,  one  with  proper  prior  and  the  other  with  diffuse 
prior  and  developed  an  algorithm,  which  does  not  rely  on  the  transformation  of 
the  data,  by  applying  the  Kalman  fiter  to  the  initial  state  vector.  In  his  1991 
paper,  de  Jong  developed  the  numerically  stable  algorithm  for  filtering, 
likelihood  evaluation,  generalized  least  squares  computation  and  smoothing  in 
which  the  diffuse  Kalman  filter  is  formulated  in  'square  root'.  He  argues  that 
the  algorithm  increases  numerical  accuracy  and  stability. 
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3jbj 

■^7 


db 


80^ 


dP, 


■ t|t-i 


x't 


^-2 

ae;^' 


+ Pt\t-i^tft 


ds^ 


dp 


t|t-i 


30, 


p ^Pt-1  ^/  ^p  9m'  90t 


30, 


-Jf,. 


3P, 


tle-i^/ 


30, 


dh^ 

307 


(4.4.2c) 


(4. 4. 2d) 
(4.4.2e) 


The  evaluation  of  the  Eq  (4.4.2.d),  in  turn,  requires  the 
following  partial  derivatives: 


^£t  ^ _ ^-Pt|t-l  / f-l  p + p y f-2  V-  p 

30,  30,  30,  -50^-x^tPtit-i 

_ p, 

00 — 


(4.4.2f) 


(e^  + jf  P y 1 

30,  30,  30,  ^ 

3e.  1 9p^.  / , 

+ r [2©^.,  — gg—  + ■X’t-i 


30, 


(4.4.2g) 


3C 

30, 

30, 

♦ <Pt-l  + 

- 

MP, 

t-2|t-l^) 

9v,., 

30, 

3vi-i 

;^o 

w/ 

30, 

rjo 

OOj 

•^t-2,  i 

dM 

30, 

^t-2, 

30, 

1 + - 

dM' 

t-2,t-l|t-l  007 

00, 


(4.4.2h) 


1*^-^  30,- 


which  can  be  obtained  from  the  prediction  equations, 
respectively.  Putting  the  above  equations,  the  Eq  (4.4.2), 
all  together  will  enable  the  gradient  vector  to  be  evaluated. 
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GRD{d)  = 


dSSEj.  (0) 
00 


(4.4.2i) 


4.4.3.  Computing  the  Hessian  Matrix 


As  can  be  seen  above,  this  optimization  program  is  non- 
linear so  we  have  to  rely  on  an  iterative  procedure  for 
estimation  of  the  unknown  parameters.  Most  of  iterative 

methods  developed  recently  requires  the  evaluation  of  the 
Hessian  matrix  for  iteration.  The  ith  row  and  jth  column 
element  of  the  Hessian  matrix  can  be  obtained  by 

differentiating  (4.4.2a)  with  respect  to  the  jth  element  of  0. 

d^SSE^(e)  _^\  db 


ae^ddj 


= -2E 


00.. 


^ + (y^  - 


(4.4.3a) 


dd^ddj 

In  order  to  evaluate  the  Eg  (4.4.3a) , we  need  the  following 
calculations, 

(4.4.3b) 
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ft 
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(4.4.3c) 


4 00,  00, 


+ p 


0^S. 


4 00,00, 


where 


02. 

00,00 


00,00, 


(4.4.3d) 
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(4.4.3i) 

Combining  all  the  partial  derivatives  given  in  the  Eg 
(4.4.2)  and  the  Eq  (4.4.3)  will  enable  the  Hessian  matrix  to 
be  computed . 


HES3W  = 


d^SSEj.{Q) 

aeae' 


(4.4.3j) 


From  the  Eq  (4.4.2i)  and  the  Eq  (4.4.3j),  the  iteration 
algorithm,  proposed  by  Davidon  (1959)  and  Fletcher  & Powell 
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(1963) , for  estimating  the  unknown  parameters  will  be  given  by 
0 = 00  - XaHESSiQo)'^GRD(da) 

where  Aq  is  determined  by  a cubic  interpolation  of  SSEt(0) 
along  the  current  search  direction. 


CHAPTER  5 

RESULTS  OF  EMPIRICAL  WORK 
5.1  Description  of  Data 

In  this  section,  I am  going  to  describe  the  structure  of 
the  data  set  which  is  used  in  this  paper.  As  can  be  seen  in 
the  Eg  (4.1.2),  the  current  stock  return  index  of  the  U.S. 
stock  markets  are  attempted  to  be  explained  by  a linear 
combination  of  constant,  current  term  spread  (=  long-term 
interest  rate  - short-term  interest  rate) , current  inflation 
rate  based  on  the  producer  price  index,  future  growth  rates  of 
the  industrial  production  of  the  U.S. A..  All  of  the  series 
are  the  same  as  the  ones  used  in  the  1990  Schwert  paper  and 
the  1990  Pagan  & Schwert  paper 

5.1.1.  Data  on  the  Monthly  Stock  Returns 

For  the  period  1889  to  1925,  the  monthly  stock  return  index 
is  measured  by  the  capital  gain  returns  to  the  Dow  Jones 
composite  index  (1972)  plus  the  dividend  yield  from  the  value- 
weighted  portfolio  of  NYSE  stocks  constructed  by  the  Cowles 
Commission  (1939,  pp  168-169).  For  the  period  1926  to  1987, 
it  is  equal  to  the  monthly  stock  returns  including  dividends 
to  the  value-weighted  portfolio  of  all  NYSE  stocks  constructed 


^ I am  very  grateful  to  Dr.  Adrian  Pagan  for  sending  me  all 
the  data  sets. 
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by  the  Center  for  Research  in  Security  Prices  (CRSP) . 

5.1.2.  Data  on  the  Short-term  Interest  Rates 

For  the  period  1889  to  1925,  the  short  term  interest  rate 
is  estimated  using  the  four  to  six  month  commercial  paper 
rates  in  New  York  from  Macaulay  (1938,  Table  10,  pp  A141-A161) 
and  the  result  of  the  simple  linear  regression  of  CRSP  yields 
on  Macaulay  yields  from  1926  to  1937.  For  the  period  1926  to 
1987,  it  is  equal  to  the  monthly  tax-free  yields  on  the  U.S. 
Government  security,  which  matures  the  end  of  the  month,  in 
the  Government  Bond  File  constructed  by  CRSP. 

5.1.3.  Data  on  the  Inflation  Rates 

For  1889,  the  inflation  rate  is  measured  by  the  monthly 
inflation  rate  of  the  Warren  and  Pearson  (1933)  index  of  the 
U.S.  producer  prices.  For  the  period  1890  to  1987,  it  is 
calculated  from  the  not  seasonally  adjusted  U.S.  producer 
price  index  in  the  Bureau  of  Labor  Statistics. 

5.1.4.  Data  on  the  industrial  Production 

For  the  period  1889  to  1918,  the  growth  rate  of  industrial 
production  is  estimated  using  the  Babson's  Index  of  the 
physical  volume  of  business  activity  from  Moore  (1961,  pp  130) 
and  the  average  ratio  of  Babson  to  adjusted  industrial 
production  for  1919-1939.  For  the  period  1919-1987,  it  is 
calculated  from  the  index  of  industrial  production  of  the 
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Federal  Reserve  Board  (1986)  and  Citibase  (1986). 

5.1.5.  Long-term  Interest  Rates  and  Term  Spreads 

For  the  period  1889  to  1918,  the  long-term  interest  rate  is 
estimated  from  Macaulay's  (1938,  Table  10,  pp.  A141-A161) 
railroad  bond  yield  index  using  the  average  ratio  to  the 
Moody's  Aa  bond  yield.  For  the  period  1919  to  1987,  it 
implies  the  Moody's  Aa  bond  yield. 

The  term  spread  is  defined  as  the  long-term  interest  rate 
minus  the  short-term  interest  rate. 

5.2  Results  of  Estimation  of  the  Model 
5.2.1.  Previous  Results  Based  on  the  Constant  Parameter  Models 

In  this  section,  I am  going  to  summarize  the  previous 
results  associated  with  the  relationships  between  the  U.S. 
stock  returns  and  the  macroeconomic  variables  described  above. 
It  will  be  helpful  for  comparing  our  estimation  results  with 
the  results  from  previous  work  on  this  topic. 

First  of  all,  large  fraction  of  the  variation  in  the  stock 
returns  can  be  accounted  for  by  future  real  activities  such  as 
real  GNP,  industrial  production,  and  investment  (Fama,  1990) . 
Especially,  the  future  growth  rates  of  industrial  production 
among  the  real  activities  has  a significant  and  positive 
effect  on  the  current  stock  returns  (Kaul  & Seyhun,  1990; 
Shanken  & Weinstein,  1990) . 

Second,  during  the  post-war  period,  stock  returns  are 
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negatively  related  to  inflation  rate,  to  be  more  precise, 
relative  price  variability.  The  negative  effect  of  relative 
price  variability  on  the  future  output  and  thus  on  the 
contemporaneous  stock  returns  are  largely  caused  by  the  supply 
shocks  in  the  1970s  (Kaul  & Seyhun,  1990) . In  this  paper, 
however,  the  inflation  rate  will  be  used  as  a proxy  for  the 
relative  price  variability. 

Third,  the  term  spread  is  positively  correlated  with 
expected  stock  returns  because  it  carries  some  information  on 
how  high  output  growth  is  anticipated  (Fama,  1990) . 

As  will  be  seen  in  the  latter  section,  however, 
relationships  between  stock  returns  and  the  three 
macroeconomic  variables,  found  in  the  previous  work,  are  valid 
only  for  a certain  time  period  in  this  model. 

5.2.2.  Results  of  Estimation  of  the  Model 

Originally,  the  general  model  described  in  the  Eq  (4.1.2) 
and  the  Eq  (4.1.5),  was  attempted  to  be  estimated  by  the 
maximum  likelihood  estimation  method.  However,  it  involves 
very  high  computational  costs.  Instead,  a simpler  version  of 
the  time-varying  parameter  model  which  does  not  allow  the 
disturbance  terms  of  both  the  measurement  and  the  transition 
equations  to  follow  an  ARCH  process  is  estimated  by  the  non- 
linear least  square  method  (NLLS) . That  is, 

Yt  = + et 

= MBfi  + Ut 


(5.2.1) 
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et  ~ N(0,  H)  and  Ut  ~ N(0,  Q) 

where  H is  a positive  scalar,  M is  a 4 by  4 matrix  and  Q is  a 
positive  definite  4 by  4 symmetric  matrix.  The  optimal 
estimates  of  M,  Q and  H are  obtained  from  minimization  of  the 
sum  of  squared  prediction  errors.  That  is, 

T T 

min  Y,  (n  - = min  £ (y^  - (5.2.2.) 

® ® t=i 

I estimate  the  Eq  (5.2.2.)  by  the  iteration  method,  proposed 
by  Davidon,  Fletch  and  Powell,  which  is  available  in  the  Gauss 
program.  The  results  of  estimation  of  the  27  unknown 
parameters  are  summarized  in  Table  5.2.1.  Unfortunately  most 
of  the  parameter  estimates  appear  to  be  statistically 
insignificantly  different  from  zeros  even  at  10  % of 

significance  level.  But  the  optimization  procedure  in  the 
Gauss  program  depends  upon  the  numerical  gradient  vector  and 
Hessian  matrix  of  the  objective  function,  which  are  obtained 
through  lots  of  the  approximation.  This  procedure  may  lead 
the  estimates  of  the  standard  errors  to  be  high  so  it  may 
suppress  the  t-statistics  of  the  estimates. 

One  of  the  several  features  of  the  parameter  estimates  is 
that  each  element  of  the  state  space  vector  R is  largely 
determined  by  its  own  lagged  value  rather  than  lagged  values 
of  other  elements.  And  relatively  high  values  of  estimates  of 
M23,  M32  and  M42  suggest  that  there  exists  some  degree  of 
correlation  between  the  coefficient  on  the  term  spread,  that 
on  the  inflation  rate  and  that  on  the  industrial  production. 
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Higher  variance  estimate  of  the  coefficient  on  the  inflation 
rate  may  be  the  evidence  that  the  relationship  between  the 
inflation  rate  and  the  stock  return  is  more  unstable  than 
those  between  either  the  term  spread  or  the  industrial 
production  and  the  stock  returns. 


Table  5.2.1 

Results  of 

Estimation  of 

the  Kalman 

Filter  Model 

Parameter 

Estimate 

S.E. 

Ratio 

Prob-value 

Mil 

0.99477 

0.57769 

1.72198 

0.08507 

M12 

0.00164 

0.09432 

0.01738 

0.98613 

M13 

0.00026 

0.57119 

0.00045 

0.99964 

M14 

-0.00016 

0.57656 

-0.00027 

0.99979 

M21 

-0.00150 

0.57769 

-0.00259 

0.99793 

M2  2 

0.44081 

5.38708 

0.08183 

0.93478 

M2  3 

-0.10221 

0.99648 

-0.10257 

0.91830 

M2  4 

0.01043 

0.66989 

0.01557 

0.98758 

M31 

-0.00372 

0.57769 

-0.00644 

0.99486 

M3  2 

-0.10443 

0.51498 

-0.20278 

0.83931 

M3  3 

0.95322 

0.57633 

1.65396 

0.09814 

M3  4 

-0.00440 

0.57745 

-0.00761 

0.99392 

M41 

0.00266 

0.57769 

0.00461 

0.99632 

M42 

0.10656 

0.31374 

0.33966 

0.73411 

M43 

0.02767 

0.57299 

0.04829 

0.96148 

M44 

0.98295 

0.57687 

1.70392 

0.08840 

Qll 

0.04200 

0.57769 

0.07270 

0.94204 

Q21 

-0.07036 

0.57769 

-0.12180 

0.90306 

Q22 

0.05432 

0.57769 

0.09403 

0.92509 

Q31 

-0.00011 

0.57769 

-0.00019 

0.99985 

Q32 

0.01140 

0.57769 

0.01973 

0.98426 

Q33 

0.46500 

0.57769 

0.80493 

0.42086 

Q41 

0.00042 

0.57769 

0.00072 

0.99942 

Q42 

-0.03069 

0.57769 

-0.05312 

0.95764 

Q43 

-0.00239 

0.57769 

-0.00413 

0.99670 

Q44 

0.01256 

0.57769 

0.02174 

0.98266 

H 

0.04200 

0.57769 

0.07270 

0.94204 

Based  on  the  parameter  estimates  shown  in  Table  5.2.1,  I 
compute  the  estimates  of  conditional  on  the  information  set 
available  at  time  t,  that  is,  the  estimates  defined  as  the  Eq 
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(4.2.3a).  Figure  (5.1)  shows  the  plot  of  the  estimate  of  the 
constant  term  of  the  updating  equation.  The  estimated  series 
appears  to  be  similar  to  that  of  the  stock  returns  with 
smaller  scale. 

In  his  paper  (1990)  , Fama  argues  that  since  a high  term 
spread  plays  a role  as  a signal  that  there  will  be  some 
improvement  in  output  growth  in  the  near  future  and  expected 
stock  return  is  positively  correlated  with  the  output  growth, 
the  term  spread  has  a positive  correlation  with  the  stock 
returns.  As  can  be  seen  in  Figure  (5.2),  the  path  of  the 
estimates  of  the  coefficient  on  the  term  spreads  shows  high 
frequency  over  time.  This  implies  that  the  information  the 
term  spreads  carry  is  noisy.  For  example,  during  the  Great 
Depression  most  of  the  firms  in  the  U.S.A.  probably  should 
suffer  from  insufficient  funds.  So  they  probably  issued 
corporate  bonds  with  high  yields  in  order  to  relieve  the  fund 
pressure  rather  than  to  do  good  projects  in  the  recession. 
This  high  term  spread  occurred  with  the  stock  market  crash 
almost  at  the  same  time  periods.  That  is  why  during  the 
Depression  the  term  spreads  seem  to  be  highly  negatively 
correlated  with  the  stock  returns.  In  addition,  the  fact  that 
during  the  post-war  period  the  term  spread  shows  some  features 
of  non-stat ionary  process  itself  may  be  able  to  explain  why 
the  coefficient  estimates  move  around  zero  with  high 
frequency.  This  may  imply  either  that  the  relationship 
between  the  term  spread  and  the  stock  return  dynamically 
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Figure  5.1  : The  estimates  of  the  constant  term  of  the  updating  equations 
using  the  U.S.  monthly  stock  return  indexes  of  the  CRSP 
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Figure  5.2  : The  estimates  of  the  coefficient  on  the  term  spread  of  the  updating 
equations  using  the  U.S.  monthly  stock  return  indexes  of  the  CRSP 
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varies  over  time  or  that  the  term  spreads  have  little 
explanatory  power  over  the  stock  returns. 

Compared  with  the  term  spread,  the  inflation  rate  seems  to 
have  a more  stable  relationship  with  the  stock  return.  As  can 
be  seen  in  Figure  (5.3),  the  path  of  the  estimates  of  the 
coefficient  on  the  inflation  rate  shows  less  frequency  than 
that  of  the  term  spread.  Next,  the  inflation  caused  by  the 
demand  shocks  in  general  tends  to  have  favorable  effects  on 
the  firms'  business  because  usually  it  accompanies  economic 
expansion  while  that  caused  by  the  supply  shocks  tends  to  have 
unfavorable  effects  on  the  firms'  business  because  it  usually 
occurs  together  with  economic  recessions.  For  example,  the 
supply-side  inflation  around  the  WWl,  the  1970s  and  the  early 
1980s  led  the  stock  market  down  so  during  these  periods  the 
stock  returns  tends  to  be  negatively  related  to  the  inflation. 
But  the  deflation  around  the  Great  Depression  also  partly  led 
the  stock  market  down  and  the  inflation  after  the  WW2  till  the 
Korean  war  produced  economic  boom  so  during  these  periods  the 
stock  returns  tends  to  be  positively  related  to  the  inflation. 
Thus  this  partly  supports  Kaul  and  Seyhun  (1990) 's  argument 
that  during  the  post-war  period  stock  returns  are  negatively 
related  to  inflation  rates  since  they  focus  on  the  effects  of 
the  supply-side  inflation  by  the  'relative  price  variability' . 

Until  the  early  1960s,  as  shown  in  Figure  (5.4),  the 
industrial  production  in  general  seems  to  be  positively 
correlated  with  the  stock  returns  even  if  there  have  been 
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Figure  5.3 


: The  estimates  of  the  coefficient  on  the  inflation  rate  of  the 
updating  equations  using  the  U.S.  monthly  stock  return  indexes  of 


the  CRSP 
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Figure  5.4 


: The  estimates  of  the  coefficient  on  the  industrial  production  of  the 
updating  equations  using  the  U.S.  monthly  stock  return  indexes  of 


the  CRSP 
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found  some  variations  in  the  scale  of  the  estimates  of  the 
coefficient  on  it.  Since  the  early  1960s  the  U.S.  economy 
looks  like  to  rapidly  lose  the  strong  incentives  in  the 
production  sides  but  the  stock  markets  meet  kinds  of  rush  as 
lots  of  funds  concentrate  on  the  financial  markets  for  the 
purpose  of  speculation.  That  is  why  during  that  period  the 
industrial  production  appears  to  be  negatively  related  to  the 
stock  returns. 

In  summary,  we  should  be  very  careful  about  the 
relationships  between  the  stock  returns  and  the  macroeconomic 
varaibles.  And  there  is  little  evidence  that  these 
relationships  are  stable  or  fixed  over  time. 

The  time-varying  model  employed  in  this  thesis  produces 
relatively  good  explanatory  power  over  the  variations  in  the 
stock  returns.  is  0.92785  from  the  updating  equations  and 
0.12698  from  the  prediction  equations.  In  Figure  (5.5)  and 
Figure  (5.6),  the  forecasted  stock  returns  obtained  from  the 
updating  equations  and  the  prediction  equations  are  plotted 
with  the  actual  stock  returns,  respectively. 

On  the  other  hand,  I estimate  the  volatility  of  the  stock 
returns  by  the  squares  of  the  prediction  errors,  and  plot  the 
results  in  Figure  (6.1).  This  figure  supports  the  results 
found  in  the  previous  literature  that  the  volatility  of  stock 
returns  varies  over  time  and  becomes  high  around  economic 
recession,  oil  shocks,  and  banking  panic. 
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Figure  5.5  : The  U.S.  monthly  stock  returns  and  their  fitted  series  by  the 

updating  equations  using  the  U.S.  monthly  stock  return  indexes  of 


the  CRSP 
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Figure  5.6  : The  U.S.  monthly  returns  and  their  forecasted  series  by  the  prediction  equations 
using  the  U.S.  monthly  stock  return  indexes  of  the  CRSP 
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5.2.3.  Test  on  Parameter  Constancy 

In  this  section,  the  constancy  of  parameter  i3t  over  time 
will  be  tested  using  the  Likelihood  Ratio  statistic. 

If  the  hypothesis  that  the  coefficients  remain  constant 
is  true,  then  the  M matrix  in  the  transition  equation  must  be 
an  identity  matrix  and  the  variance  of  6^  must  be  equal  to 
zero.  As  the  result,  the  null  and  alternative  hypotheses  for 
testing  the  constancy  of  coefficients  are  given  by 

Hq  : = ^t-i  ■ ■ ■ = i.e.,  M is  identity  matrix, 

and  has  zero  variance,  t = 1,  2,  • • • , T 

Ha  : Either  that  M is  not  an  identity  matrix 

or  that  there  exists  at  least  a non-zero  variance  of 

The  null  hypothesis  implies  a simple  linear  model  with 
constant  coefficients,  i.e.,  the  estimate  of  R can  be 
estimated  by  an  OLS  method. 

The  hypotheses  described  above  can  be  expressed  in  a more 
general  form  as  below. 

Ho  : R0  - \|j  = 0 

Ha:R0-i|j#O  (5.2.3) 

where  R is  the  nxk  known  matrix  of  constants,  0 is  the  kxl 
vector  of  unknown  parameters,  and  i|j  is  the  nxl  vector. 

In  order  to  get  a test  statistic  of  the  parameter  constancy 
test,  we  need  to  estimate  the  model  described  in  the  previous 
section.  This  is  done  by  maximising  the  following  log- 
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likelihood  function  with  respect  to  the  unknowns: 

LogL  = ~^TLn{2*Tz)  - Ln{ff,) 

^ c=i  ^ t=i 

where 

ff^  = x^*P^*x't  + H. 

Estimating  the  model  under  the  null,  we  utilize  information  on 
both  Yt  and  Xt  simultaneously.  Thus  for  the  parameter 
constancy  test  it  is  more  reasonable  to  construct  the  log- 
likelihood  function  based  on  the  updating  eguations  rather 
than  on  the  prediction  equations.  The  MLE  result  is 
summarized  in  Table  (5.2.2). 

There  have  been  proposed  three  useful  misspecif ication 
tests  which  can  apply  to  this  kind  of  case  : the  likelihood 
ratio  test  (LR  test) , the  Wald  test  (W  test)  and  the  Lagrange 
multiplier  test  (LM  test)  . Let  0q  and  denote  the  restricted 
and  unrestricted  estimates  of  6,  respectively.  First  the  LR 
test  is  based  upon  a comparison  of  the  supremum  of  L(0)  under 
the  null  hypothesis  with  that  under  the  alternative 
hypothesis.  The  LR  test  statistic  is  given  by 

LR  = 2[L(0J  - L(0o)  ] (5.2.5a) 

Applying  the  Taylor  expansion  to  L(0g)  around  0q  and  ignoring 
the  higher  order  terms  than  the  second  order,  we  obtain 

L(0J  « L(0o)  + (0a  - 0o)  'H(0a)  (0a  - 0o)/2  (5.2.5b) 

where  H(0a)  is  the  hessian  matrix  evaluated  at  unrestricted 
estimates  of  0.  Rearranging  the  Eg  (5.2.5b)  will  give 
(0a  - 0o)'H(0a)(0a  " 0q)  « 2[L(0a)  ~ L(0q)] 
where  under  the  null  hypothesis  (0^  - 0q)  'H(0a)  (0a  - 0q)  is 
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known  as  being  asymptotically  x^“distributed  with  n degree  of 
freedom  so  is  2[L(0a)  - L(0o)  ] . 


Table  5.2.2  Results  of  Estimation  of  the  Model  under  Hg 


Parameter 

Estimate 

S.E. 

Ratio  Prob- 

-value 

Mil 

0.99836 

0.02000 

49.92786 

0.00000 

M12 

-4.68223 

0.01998 

-234.30924 

0.00000 

M13 

0.40468 

0.01934 

20.92403 

0.00000 

M14 

0.31499 

0.01990 

15.82763 

0.00000 

M21 

-0.00144 

0.01996 

-0.07203 

0.94258 

M2  2 

0.34642 

0.01998 

17.33508 

0.00000 

M2  3 

-0.10078 

0.01990 

-5.06295 

0.00000 

M2  4 

0.01257 

0.01993 

0.63067 

0.52826 

M31 

-0.00388 

0.01987 

-0.19546 

0.84503 

M3  2 

0.00075 

0.01990 

0.03791 

0.96976 

M3  3 

0.92057 

0.01990 

46.25440 

0.00000 

M34 

-0.01991 

0.01993 

-0.99880 

0.31789 

M41 

0.00275 

0.01993 

0.13796 

0.89028 

M42 

-0.07199 

0.01989 

-3.61879 

0.00030 

M43 

0.05017 

0.01931 

2.59855 

0.00936 

M44 

0.94769 

0.01817 

52 . 16018 

0.00000 

Qll 

0.04200 

0.01977 

2 . 12486 

0.03360 

Q21 

0.14533 

0.01996 

7.28045 

0.00000 

Q22 

0.05471 

0.01996 

2.74096 

0.00613 

Q31 

-0.01681 

0.02001 

-0.83977 

0.40104 

Q32 

0.01170 

0.01988 

0.58844 

0.55624 

Q33 

0.41565 

0.01999 

20.79626 

0.00000 

Q41 

-0.01301 

0.01993 

-0.65283 

0.51387 

Q42 

-0.03717 

0.02001 

-1.85784 

0.06319 

Q4  3 

-0.06195 

0.02001 

-3.09608 

0.00196 

Q44 

0.34269 

0.01985 

17.26082 

0.00000 

H 

0.00003 

0.00000 

8.65876 

0.00000 

The  LR 

test  requires 

relatively 

high  computational  cost 

since  the 

model  must  be 

estimated 

under  both  the 

null  and 

alternative  hypotheses. 

Given  the  fact  that  0^  is  consistent  for  0 and,  under  Hq, 
R0  - i);  = 0 so  that  R0  - \|i  tends  to  be  a null  vector,  the  W 
test  is  a test  of  the  joint  significance  of  the  elements  of 
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R0a  - i|i.  Using  the  fact  that  RS^  - i|;  is  asymptotically 
normally  distributed  with  mean  zero  and  variance  RA’^R'  where 
A is  the  information  matrix  evaluated  at  unrestricted 
estimates  of  0,  we  can  obtain 

W = (R0,  - i|i)  • (RA-^(0JR')-1(R0,  - i|»)  ~ x^(n)  (5.2.5c) 

under  the  alternative  hypothesis. 

Using  the  results  shown  in  Table  5.2.2,  we  can  compute  the 
LR  test  statistic. 

LR  = 2[L(0J  - L(0o)] 

= 4399.10  - 1754.69 
= 2644. 41>  12.20 

The  critical  value  from  the  chi-square  distribution  with 
degree  of  freedom  of  26  at  the  1 % significance  level  is  12.20 
so  the  null  hypothesis  is  rejected.  That  is,  there  is  little 
statistical  evidence  to  say  that  the  constant  coefficient 
model  works  better  than  the  time-varying  coefficient  model. 

5.3  Estimation  of  the  Alternative  Models  of  the  Stock  Returns 

Let's  consider  alternative  stock  return  generating 
structures.  There  have  been  proposed  a lot  of  constant 
coefficient  models.  Out  of  them,  OLS  models,  GARCH  models  and 
ARIMA  models  have  been  widely  employed  in  many  literatures. 
So  the  next  thing  to  do  is  to  estimate  two  models  and  compare 
their  performance  with  that  of  the  model  in  this  thesis. 
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5.3.1.  OLS  Model  of  Stock  Returns 
Suppose  that 

Yt  = XtB  + €t 

and  €t  ~ N(0,  a^)  . Then  the  OLS  estimator  of  6 is  a best 
linear  unbiased  estimated  (BLUE) . 


Table  5.3.1.  Results  of  Estimation  of  the  OLS  Model 


Variable 

Estimate 

Standard  Error 

t-statistic 

Constant 

0.00450 

0.00234 

1.92649 

TERM 

1.35845 

1.15867 

1.17241 

USPPI 

0.14397 

0.10839 

1.32826 

USIP 

0.49024 

0.06310 

7.76904 

The  estimation  results  are  summarized  in  the  Table  5.3.1. 
None  of  the  estimates  of  the  coefficients  on  the  macroeconomic 
variables  except  the  growth  rates  of  industrial  production  is 
statistically  significant  at  the  5 % level.  Positive 
correlation  between  the  stock  returns  and  the  term  spreads  or 
the  industrial  production  is  compatible  with  results  in  the 
previous  studies  but  that  between  the  inflation  rates  and  the 
stock  returns  is  not.  This  simple  model  explains  only  5.9  % 
of  the  variation  in  the  stock  returns.  The  actual  stock 
returns  and  the  estimates  by  the  OLS  model  are  plotted  in 
Figure  (5.7)  . As  can  be  seen  in  that  figure,  the  OLS 
estimates  rarely  follow  the  movement  of  the  actual  stock 
returns  in  magnitude  and  in  direction.  Thus  its  explanatory 
power  is  so  low. 
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Figure  5.7  : The  U.S.  monthly  stock  returns  and  their  fitted  series  by  the  OLS  model  using 
the  U.S.  monthly  stock  return  indexes  of  the  CRSP 
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5.3.2.  GARCHfp.g^  Models  of  Stock  Returns 

In  generalized  autoregressive  conditional 
heteroskedasticity  model  (GARCH) , first  proposed  by  Engle 
(1982)  and  Bollerslev  (1986) , the  current  conditional  variance 
is  determined  by  sguares  of  lagged  estimated  errors  and  lagged 
conditional  variances.  So  far,  there  have  been  several 
versions  of  the  GARCH  models  with  different  assumptions  on  the 
functional  forms  of  the  conditional  variances.  Since  the 
GARCH  model  can  produce  good  explanatory  power  over  the  series 
whose  variances  vary  over  time,  there  have  been  attempted  a 
lot  of  applications  to  financial  data  such  as  stock  market 
data  and  foreign  exchange  rate  data. 

The  GARCH  model  to  be  estimated  in  this  section  is  as 
follows: 

= C + T^TERM^  + r^USPPI^  + r^USlP^  + 6^  + 6^,  ~ N{0,  h^) 

where 

3 3 

= “o  + E + E 

i=l  j=l 

Table  5.3.2  shows  that  stock  returns  are  insignificantly 
negatively  related  to  the  term  spreads  and  positively  to  the 
inflation  rates.  Both  results  seem  to  contradict  the  results 
reported  in  the  previous  study  in  this  area.  This  model  also 
yields  a statistically  significantly  positive  correlation 
between  the  stock  returns  and  the  growth  rates  of  industrial 
production.  But  current  conditional  variance  appears  to  have 
no  explanatory  power  over  current  stock  return  since  the 
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estimate  of  0 is  equal  to  zero.  The  estimates  of  the 
coefficients  of  the  conditional  variance  equation  imply  that 
only  one-month-lagged  squared  error  has  some  significant 
effect  on  the  current  conditional  variance.  This  model  also 
yields  a low  of  0.0329. 


Table  5.3.2  Results  of  Estimation  of  the  GARCH(3,3)  Model 


Parameter 

Estimate 

Standard 

Error 

t-statistic 

C 

. 00636 

. 03854 

. 16489 

TERM 

-.78556 

1.04670 

-.75051 

USPPI 

. 10547 

.10252 

1.02876 

USIP 

.47949 

.05419 

8.84806 

0 

.00000 

.77963 

. 00000 

“o 

. 00172 

. 01099 

. 15610 

“i 

. 00860 

.00413 

2 . 08373 

“2 

.02441 

. 06674 

.36573 

0=3 

. 00375 

. 17334 

.02163 

01 

.196653 

7 . 51483 

.02617 

02 

. 000000 

2 . 53755 

. 00000 

03 

. 072707 

.53582 

. 13569 

The  actual  stock  returns  and  the  estimates  by  the 
GARCH(3,3)  are  plotted  in  Figure  (5.8).  We  can  find  that  like 
the  OLS  model,  this  model  also  does  not  produce  good  estimates 
following  the  actual  series  closely. 


5.3.3.  ARIMA(p.d.g)  Models  of  Stock  Returns 

The  general  form  of  an  autoregressive-integrated-moving 
average  process  of  orders,  p,  d and  q is  defined  as  below: 


4>(L)  V'Vt  = 00  + 0(i-) 


(5.3.3a) 
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Figure  5.8  : The  U.S.  monthly  stock  returns  and  their  fitted  series  by  the  GARCH(3,3) 
model  using  the  U.S.  monthly  stock  return  indexes  of  the  CRSP 
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4>(L)  = 1 - (|)iL  - 4>2L2  - - - (J)pL^  (5.3.3b) 

0(L)  = 1 + e^L  + (5.3.3c) 

where  V^Yt  is  the  dth  difference  of  and  L is  a lag  operator. 

As  the  first  step  to  estimation  of  the  model,  we  have  to 
choose  appropriate  values  of  p,  d and  q.  In  order  to  check 
whether  {y^}  is  non-stationary  or  stationary,  we  need  to  do 
the  so-called  unit  root  test  of  (y^)  because  if  (y^)  is  non- 
stationary,  then  regression  will  be  spurious  so  that  the 
differencing  of  {y^}  should  be  made  until  it  becomes 
stationary.  The  unit  root  test  proposed  by  Dickey  and  Fuller 
(1979)  is  to  regress  Vy^  on  constant,  time  (t)  and  y^.^  and 
then  to  test  the  null  hypothesis  that  the  coefficient  on  y^-i 
is  equal  to  zero.  That  is, 

Vyt  = Q!o  + “it  + a2yt-i  + et  (5.3.3d) 

where  Vy^  = yt  “ Yt-i  and  et  is  disturbance  term.  Then  the 
testing  hypotheses  are  given  by 

Hq  ! 0^2  ~ ® 

Ha  : ccz  < 0 

Table  5.3.3a  proves  that  the  null  hypothesis  is  rejected  at  1 
% of  significance  level  since  the  test  statistic  is  lower  than 
the  critical  value  of  -29.5.  It  follows  that  the  monthly 
stock  returns  are  stationary  in  mean  so  that  there  is  no  need 
for  differencing  the  stock  return  series.  In  other  words,  the 
appropriate  value  of  d is  equal  to  zero. 

Next  thing  to  do  before  estimating  Equation  (5.3.3a)  is  to 
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determine  the  suitable  values  of  p and  q.  There  are  three 


Table  5.3.3a  Results  of  the  Dickey-  Fuller  Test 


Parameter 

Estimate 

Standard  Error 

t-Statistic 

“o 

0.005600 

0.003043 

1.83897 

0.000003 

0.000004 

0.77841 

0=2 

-0.908638 

0.028986 

-31.34750 

most  widely 

used 

criteria 

for 

selection  of 

the  most 

appropriate 

values 

of  p and 

q- 

The  earliest 

selection 

criterion  is  the  Akaike  Information  Criterion  (AIC) , which  is 
defined  as 

AIC(p,q)  = ln(S2)  + 2 (p  + q)/T 

where  is  the  estimate  of  the  error  variance  of  the 
ARMA(p,q)  model  fitted  to  Zt  = V^Yt-  In  general,  however,  the 
AIC  tends  to  overparameterize  the  model.  A second  criterion, 
proposed  by  Russanen  (1978)  and  Schwarz  (1978),  is  given  by 
BIC(p,q)  = In(S^)  + (p  + q)T'^ln(T). 

A third  criterion,  proposed  by  Hannan(1980) , is  defined  as 
PIC(p,q)  = In(S^)  + (p  + q)  cT’^ln(ln(T)  ) . 

Optimal  values  of  p and  q will  be  determined  such  that  each  of 
the  criteria  is  minimized  at  these  values.  Hannan  shows  that, 
unlike  the  AIC,  the  BIC  and  the  PIC  are  strongly  consistent  in 
that  they  determine  the  true  model  asymptotically.  So  in  many 
cases,  either  the  BIC  or  the  PIC  has  been  used  in  preference 
to  the  AIC. 

Appendix  1 shows  that  both  the  BIC  and  the  PIC  are 
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minimized  when  p = 2 and  q = 2 whereas  the  AIC  is  minimized 
when  p = 8 and  q = 2.  We  can  find  that  these  results  are 
supportive  to  Hannan's  argument.  Combining  the  result  of  the 
unit  root  test  with  either  the  BIC  or  the  PIC  suggests  that 
the  ARIMA(2,0,2)  model  of  the  stock  returns  is  most 
appropriate.  That  is,  the  regression  equation  is  defined  as 
Yt  = 0iYt-i  + 02Yt-2  + et  + 0iet-i  + Qz^t-i 
and  its  estimation  results  are  summarized  in  Table  5.3.3b. 


Table  5.3.3b  Results  of  Estimation  of  the  ARIMA(2,0,2)  Model 


Parameter 

Estimate 

Standard 

Error 

t-statistic 

01 

-.382716 

. 088086 

-4 . 34480 

02 

-.655847 

.078728 

-8.33052 

©1 

-.508690 

.077723 

-6.54492 

©2 

-.760870 

. 065652 

-11.5894 

Table  5.3.3b  shows  that  the  current  stock  return  is  negatively 
correlated  with  two  lagged  stock  returns  and  the  shocks  to  the 
stock  returns  which  appears  over  past  two  months.  This  model 
also  yields  very  noisy  estimates  of  the  stock  returns  so  the 
R^  is  equal  to  only  0.0349.  In  Figure  (5.9),  we  can  find  that 
the  estimates  by  the  ARIMA(2,0,2)  model  move  more  volatilely 
than  the  actual  stock  returns  do. 
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Figure  5.9  : The  U.S.  monthly  stock  returns  and  their  fitted  series  by  the  ARIMA(2,0,2) 
using  the  U.S.  monthly  stock  return  indexes  of  the  CRSP 
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5.3.4.  Comparison  between  the  Performances  of  the  Varying 
Parameter  Model  and  the  Alternative  Models 

In  this  section,  I will  compare  the  overall  performance  of 
the  time-varying  parameter  model  of  the  stock  returns  with 
that  of  the  alternative  models  of  the  stock  returns. 

As  shown  in  Table  (5.3.4),  our  model  explains  almost  93  % 
of  the  variation  in  the  stock  returns  when  current  stock 
returns  is  in  the  known  information  and  nearly  12.7  % of  the 
variation  in  the  stock  returns  when  only  the  information  about 
stock  returns  up  to  previous  month  is  available.  But  none  of 
the  alternative  models  can  explain  more  than  6 % of  the  stock 
return  variation. 

The  value  of  the  estimated  mean  squared  error  (MSE)  is  also 
commonly  used  as  a measure  of  the  model  performance.  The 
estimated  mean  squared  errors  calculated  from  our  model  is 
almost  half  of  those  obtained  from  the  alternative  models. 
Thus  these  two  measures  suggest  that  our  model  works  better 
than  the  alternative  models  which  assume  time-invariant 
parameters. 

Table  5.3.4.  Comparison  between  the  Alternative  Models 

Estimated 

Models  Mean  Squared  Errors 

Kalman  Filter  Model 

Updating  Equations  0.001393  0.92785 

Prediction  Equations  0.001374  0.12698 

OLS  Model  0.002587  0.05964 

GARCH(3,3)  Model  0.002598  0.05672 

ARIMA(2,0,2)  Model  0.002719  0.03290 
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5.4  Implications  of  the  Kalman  Filter  Model  on  Dynamic 

Market  Efficiency 

In  an  efficient  stock  market,  the  asset  prices  of  a 
specific  firm  fully  and  instantaneously  reflect  all  available 
relevant  information  concerning  the  firm's  business,  and  so  do 
stock  returns.  It  follows  that  an  investor  can  make  the  more 
accurate  expectation  of  a current  stock  return  when  he  can 
have  an  access  to  some  information  concerning  the  firm's 
future  business  than  when  he  relies  just  on  the  past  history 
of  the  firm's  business.  However,  once  a new  information  on 
the  firm's  business  is  released,  that  will  be  fully  and 
instantaneously  incorporated  into  the  firm's  stock  returns  if 
stock  markets  are  informationally  efficient.  So  an 
expectation  of  current  stock  return  based  on  the  information 
available  up  to  now  should  not  be  different  from  that  based  on 
the  cumulative  set  of  the  information  which  already  has  been 
released. 

In  other  words,  a stock  market  is  said  to  be  dynamically 
efficient  if  V(et|t-i)  > V(et)  = V(et|i)  for  almost  all  t < T 
where  e^it-i  = Yt  “ Xtbt|t-i,  = Yt  ~ and  e^ii  = yt  " 

The  ineguality  between  first  two  variances  is  an  obvious 
result  obtained  from  the  Kalman  filter. 

®t  ~ Yt  “ Pt|t-i^t'ft  ^(Yt  “ ] 

= [1  - XtPtlt-iXt'ft'M  (Yt  - Xtbt|t-i) 

Note  that  f^  = x^Ptit-iKt'  + h^'^,  as  defined  in  the  Eg  (4.2.3c), 
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is  the  variance  of  prediction  error,  s^,  so  both  terms  are 
nonnegative  definite.  It  follows  that 
0 < (1  ~ 1 

Thus  the  variance  of  the  term,  et,  is  smaller  than  that  of  the 
other  term,  e^it-i- 

Testing  the  dynamic  efficiency  of  stock  markets  involves 
the  joint  test  of  the  following  three  hypotheses: 

Test  1 : 

Ho  : V(ej  = V(et|t-i) 

Hi  : V(eJ  < V(et|t-i) 

Test  2 : 

Hq  : V(et|i)  ~ V(et|t-i) 

Hi  : V(et|x)  < V(et|t-i) 

Test  3 : 

Ho  : V(eJ  = V(etiT) 

Hi  : V(eJ  > V(et|T) 

For  simplicity,  let  us  assume  that  estimated  errors,  et, 
et|t-i  and  et|i,  are  independently  distributed  with  mean  zero  and 
that  there  exists  f such  that  ft  converges  to  f as  t goes  to 
infinity.  I perform  the  conventional  F-test  on  those 
hypotheses.  The  test  statistics  of  the  tests  are  defined  as 
V(et)/V(et|t-i)  , V(et|T)/V(et|t-i)  and  V(et) /V(et|i)  / respectively. 
In  the  first  two  tests,  the  null  is  rejected  if  the  test 
statistic  is  less  than  the  critical  value  whereas  in  the  third 
test  the  null  is  rejected  if  the  opposite  happens.  Table 
(5.4)  shows  that  in  the  first  two  tests  the  null  hypotheses 
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are  rejected  but  in  the  third  test  the  null  hypothesis  is  not 
rejected  at  5 percents.  Therefore  we  can  conclude  that  there 
is  found  little  evidence  that  stock  markets  are 
informationally  inefficient. 


Table  5.4  Results  of  Market  Efficiency  Tests 


test 

test  statistic  (F) 

critical  value 

reject  or  not 

Test 

1 

0.1427 

0.9341 

reject  Hq 

Test 

2 

0.2396 

0.9341 

reject  Hq 

Test 

3 

0.5955 

1.0731 

not  reject  Hq 

CHAPTER  6 

IMPLICATIONS  ON  THE  CHARACTERISTICS  OF  THE  STOCK  RETURNS 
6.1  Volatility  of  Stock  Returns 

First,  our  estimation  results  are  very  supportive  for  the 
fact  that  the  volatility  of  stock  returns  varies  over  time,  in 
particular,  it  tends  to  be  generally  very  high  during  the  war, 
economic  recession,  oil  shocks,  and  banking  panic.  In  this 
paper,  volatilities  of  stock  returns  are  measured  by  the 
squares  of  prediction  errors.  Figure  6.1  clearly  shows  that 
the  volatilities  of  stock  returns  vary  over  time.  For 
example,  the  high  volatility  of  stock  returns  around  the  1910s 
is  because  of  the  World  War  I,  that  around  the  1930s  is  caused 
by  the  Great  Depression,  and  that  since  the  1970s  is  ascribed 
to  the  oil  shocks  and  the  banking  panic.  Especially,  the  high 
volatilities  which  occurred  around  the  Depression  and  the 
1970s  were  very  big  in  magnitude  and  lasted  relatively  for 
long  time. 

Second,  our  results  also  support  the  result,  reported  by 
Nelson  (1989) , French,  Schwert,  and  Stambaugh  (1987) , or 
Poterba  and  Summers  (1986) , that  shocks  to  volatility  of  stock 
returns  are  persistent  with  declining  AR  coefficients.  As  a 
test  of  persistence  of  shocks  to  volatility,  I regress  current 
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Figure  6.1  : The  volatilities  of  the  U.S.  monthly  stock  returns  measured  by  the  squares  of 
the  estimated  residuals  of  the  prediction  equations 
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squared  prediction  errors  on  the  lagged  squared  prediction 
errors.  That  is, 


*2 


+ agEfc.s  + agE(-_g  + ^^2^t-X2  ^24®t-24 


(6.1.1a) 


The  regression  results  of  the  Eq  (6.1.1a)  are  summarized  in 
Table  (6.1.1a).  It  is  shown  that  the  estimates  of  the  AR 


Table  6.1.1a  Regression  of  the  Volatility  Measure 


Variable 

Estimated 

Coefficient 

Standard 

Error 

t-statistic 

ao 

. 000179 

. 000044 

4 . 03852 

.210394 

. 029938 

7.02771 

^2 

.094817 

.031219 

3.03715 

as 

.110118 

. 032430 

3.39556 

a^ 

. 022449 

. 031232 

.71879 

as 

-.021027 

.031091 

-.67630 

ae 

.011853 

.030259 

. 39170 

ai2 

. 159586 

.031322 

5.09507 

a2i( 

.075936 

.029374 

2 . 58512 

coefficients  are  declining  with  some  seasonality  because  they 
smoothly  decrease  so  the  4 -month  or  further  lagged 
volatilities  become  statistically  negligible,  but  the 
coefficient  estimate  of  the  twelve  month  lagged  volatility 
again  becomes  statistically  significant.  I find  that  it  takes 
almost  0.44  months  for  a shock  to  volatility  to  become  half  as 
strong  as  the  original  one  when  a formula,  p*'  = 1/2,  applies 
to  the  largest  AR  root  in  Table  (6.1.1a).  This  appears  to 
imply  that  the  shocks  to  the  volatility  of  stock  returns  fade 
out  very  quickly  but  they  have  some  seasonality.  Note  that 
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Nelson  (1989)  shows  that  the  half-life  h of  a shock  associated 
with  the  largest  root  is  about  7.3  years  he  obtains  using 
daily  data  instead  of  monthly  data.  But  this  long  persistence 
might  stem  from  the  misspecif ication  of  his  model. 

Third,  low  egual  to  0.177  is  obtained  from  the 
regression  of  the  Eg  (6.1.1a).  This  means  that  only  a little 
portion  of  variation  in  the  current  volatility  can  be 
explained  by  the  lagged  ones. 

6.2  Stock  Returns  and  Economic  Activity 

First,  our  results  in  this  work  are  supportive  of  the 
previous  results,  such  as  French,  Schwert  and  Stambaugh 
(1987) , Campbell  and  Shiller  (1988) , Fama  and  French 
(1989a, b).  Turner,  Startz  and  Nelson  (1989),  and  Ball  and 
Kothari  (1989) , that  expected  stock  returns  are  time-varying. 
Two  different  types  of  expected  stock  returns  are  obtained 
from  different  sizes  of  information  set  and  are  plotted  in 
Figure  (5.5)  and  Figure  (5.6),  respectively.  Figure  (5.5) 
shows  current  expected  stock  returns  based  on  the  currently 
available  information  about  stock  returns  while  Figure  (5.6) 
plots  them  based  on  one-month-lagged  information.  It  is  not 
difficult  to  find  from  these  three  figures  that  both  series 
are  varying  over  time. 

Second,  stock  returns  are,  to  some  extent,  predictable. 
Our  model  produces  relatively  high  predictive  power  over  stock 
returns.  As  shown  in  Table  (5.3.4) , nearly  thirteen  percent  of 
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variation  in  monthly  stock  return  index  can  be  explained  by 
the  model  employed  in  this  work.  Fama  and  French  (1988a) 
argue  that  predictable  variation  of  stock  returns  stems  from 
slowly  mean-reverting  component  of  stock  prices  and  then  using 
individual  firms'  stock  returns  show  that  nearly  40  percent  of 
variation  of  stock  returns  from  portfolio  of  small  firms  can 
be  predicted.  But  it  may  be  worthwhile  to  note  that  their 
results  are  obtained  from  portfolio  of  much  longer  time 
horizon  of  3 to  5 years  than  a month  we  assume  in  this  work. 

Third,  as  already  shown  in  Section  (5.2.2),  relations 
between  stock  returns  and  macroeconomic  variables  such  as 
interest  rates,  GNP  growth  rates,  growth  rates  of  domestic 
investment,  and  inflation  rates  are  no  longer  stable.  It  can 
be  easily  found  that  the  relationships  such  as  a strong 
positive  effect  of  changes  in  the  industrial  production,  a 
positive  effect  of  the  term  spreads  and  a negative  effect  of 
inflation  rates  on  the  stock  returns  are  valid  only  for  some 
sub-sample  periods  and  are  changing  constantly  over  time. 

Figure  (5.1),  Figure  (5.2),  Figure  (5.3),  and  Figure  (5.4) 
show  how  the  estimates  of  the  coefficients  of  the  measurement 
eguation  vary  over  time.  In  Section  (5.2.3)  , we  find  that  the 
null  hypothesis  that  parameters  of  our  model  do  not  vary  over 
time  is  rejected  at  the  1 percent  significance  level. 
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6.3  Stock  Returns  and  Volatility 

In  this  section,  I will  analyze  the  relationship  between 
the  stock  returns  and  their  volatilities.  First,  I find  that 
there  exists  a negative  correlation  between  current  stock 
return  and  next  month  volatility.  To  test  it,  I run  a 
regression  of  current  volatility  of  stock  return  on  one-month- 
lagged  stock  return,  that  is, 

VOLATt  = To  + TiRETt-i  + e^  (6.3a) 
where  VOLATt,  RETfi  and  et  stand  for  the  stock  return 
volatility  at  time  t,  the  stock  return  at  time  t-1  and 
disturbance  term,  respectively.  The  estimation  results  are 
given  in  Table  (6.3a). 


Table  6.3a  Results  of  Estimation  of  the  Eg  (6.3a) 


Estimated 

Standard 

Variable 

Coefficient 

Error 

t-statistic 

To 

0.00054 

0.00129 

0.42301 

Ti 

-0.00249 

0.00073 

-3.41563 

This  supports  the  previous  results,  done  by  Nelson  (1989d) , 
Schwert  (1989a)  and  Turner,  Startz  and  Nelson  (1989),  that 
declines  in  the  stock  market  tends  to  be  associated  with 
subsequent  increases  in  volatility.  Figure  6.2  shows  the 
relationships  between  monthly  stock  return  indexes  and  their 
volatilities. 

Next,  according  to  the  mean  variance  theorem  in  finance, 
high  risk  associated  with  portfolios  of  risky  financial  assets 
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Figure  6.2  : The  U.S.  monthly  stock  returns  and  their  estimated  volatilities 
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should  be  compensated  by  high  expected  return  from  the 
portfolios.  We  can  employ  the  volatility  of  the  stock  return 
index  as  a measure  of  the  overall  risk  associated  with 
investment  in  stock  markets.  Then  we  can  derive  a positive 
correlation  between  the  expected  return  and  its  volatility. 
Since  the  current  stock  return  is  positively  related  to  the 
expected  return,  the  current  stock  return  should  be  positively 
related  to  the  volatility.  Running  a regression  of  the  Eq 
(6.3b),  I reassure  this  fact:  refer  to  Table  (6.3b). 

RETt  = $0  + #iVOLATt  + Ot  (6.3b) 

Table  6.3b  Results  of  Estimation  of  the  Eq  (6.3b) 


Estimated 

Standard 

Variable 

Coefficient 

Error 

t-statistic 

C 

. 00721 

. 00169 

4.27596 

VOLATILT 

3.18468 

1.21154 

2.62861 

CHAPTER  7 

CONCLUDING  REMARKS 


The  main  purpose  of  this  work  is  to  construct  a time 
series  model  which  can  explain  the  movement  of  stock  returns 
well.  One  option  to  establish  the  purpose  is  to  construct  the 
stock  return  generating  process,  which  depends  on  some 
macroeconomic  variables  such  as  the  short-term  interest  rates, 
the  inflation  rates  and  the  industrial  production,  and  to 
allow  the  coefficients  of  linear  regression  model  to  vary  with 
some  degree  of  regularity  over  time  by  incorporating  the 
Kalman  filter  into  the  linear  model.  Before  constructing  the 
Kalman  filter  model,  I thought  that  it  is  first  necessary  to 
clarify  the  distribution  of  stock  returns  since  the  previous 
work  points  out  that  stock  returns  tend  to  have  fatter  tails 
and  bigger  kurtosis  than  the  normal.  Out  of  several  density 
estimation  methods,  the  naive  estimator  and  kernel  estimator 
are  obtained  with  different  estimation  procedures  from  the 
previous  ones.  I find  that  the  difference  between  the  normal 
and  the  so-called  generalized  error  distribution  is 
indistinguishable  in  capturing  the  true  distribution  and  that 
the  GED  requires  higher  computational  cost  than  the  normal 
distribution.  Thus  the  normal  distribution  is  employed  in 
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this  work  as  the  distribution  of  monthly  stock  returns. 

In  Chapter  4,  I construct  a more  general  form  of  the  Kalman 
filter  model  allowing  the  disturbance  terms  in  both  the 
measurement  equation  and  the  transition  equation  to  follow  an 
ARCH(l)  process.  It  is  shown  that  the  ARCH(l)  assumption 
yields  different  forms  of  the  prediction  equation  and  the 
updating  equation. 

In  Chapter  5,  a simpler  form  than  the  described  one  in  the 
previous  chapter  is  estimated  by  the  Davidon-Fletcher-Powell 
iteration  method  since  the  general  form  of  the  time-varying 
parameter  model  needs  many  unknown  parameters  to  be  estimated 
simultaneously  so  that  it  causes  the  tremendous  computational 
costs.  Based  on  the  estimation  results,  I perform  two  tests, 
the  test  of  the  time-varying  model  against  the  time-invariant 
model  and  the  test  of  stock  market  efficiency.  At  the  first 
test,  the  null  hypothesis  that  the  true  model  of  the  stock 
returns  is  time-invariant  is  rejected  at  1 % of  significance 
level.  At  the  second  test,  the  null  hypothesis  that  the  stock 
markets  are  informationally  efficient  is  not  rejected  at  5 %. 

And  I compare  the  performance  of  the  model  employed  in  this 
thesis  with  that  of  the  alternative  models  such  as  the  OLS 
model,  the  GARCH(3,  3)  model  and  the  ARIMA(2,  0,  2)  model  and 
find  that  our  model  has  better  explanatory  power  for  the 
variation  of  the  monthly  CRSP  stock  returns  than  the 
alternative  models  do.  Finally,  based  on  the  results  of  the 
estimation  of  the  Kalman  filter  model,  the  characteristics  of 
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the  U.S.  stock  returns,  which  are  found  in  the  previous  work, 
are  reinterpreted.  I find  that  many  of  them  reach  different 
implications. 


APPENDIX 


Table  A.l  Criteria  for  Choosing  the  Appropriate  Values  of  p and  q 


NAR 


NMA  0 


1 


AIC  -5.8843 
BIC  -5.8800 
PIC  -5.8827 


2 


AIC  -5.8822 
BIC  -5.8737 
PIC  -5.8790 


3 


AIC  -5.8860 
BIC  -5.8732 
PIC  -5.8812 


4 


AIC  -5.8873 
BIC  -5.8702 
PIC  -5.8809 


5 


AIC  -5.8964 
BIC  -5.8750 
PIC  -5.8883 


6 


AIC  -5.8964 
BIC  -5.8708 
PIC  -5.8868 


7 


AIC  -5.8965 
BIC  -5.8666 
PIC  -5.8853 


8 


AIC  -5.9003 
BIC  -5.8661 
PIC  -5.8874 


9 


AIC  -5.8993 
BIC  -5.8607 
PIC  -5.8847 


1 2 3 4 5 6 


-5.8819  -5.8803  -5.8880  -5.8863  -5.8979  -5.8959 
-5.8734  -5.8674  -5.8709  -5.8649  -5.8722  -5.8660 


-5.8787 

-5.8754 

-5.8816 

-5.8783 

-5.8882  -5.8846 

-5.8805 

-5.8677 

-5.8757 

-5.9009 

-5.8838 

-5.8944 

-5.8984 

-5.8770 

-5.8904 

-5.8961 

-5.8705 

-5.8865 

-5.8998  -5.8981 
-5.8698  -5.8639 
-5.8885  -5.8852 

-5.8845 

-5.8674 

-5.8780 

-5.8984 

-5.8770 

-5.8904 

-5.8959 

-5.8702 

-5.8862 

-5.8942 

-5.8643 

-5.8829 

-5.8982  -5.8965 
-5.8639  -5.8580 
-5.8853  -5.8820 

-5.8856 

-5.8642 

-5.8776 

-5.8961 

-5.8704 

-5.8864 

-5.8944 

-5.8645 

-5.8831 

-5.8927 

-5.8585 

-5.8798 

-4.6907  -4.6890 
-4.6522  -4.6462 
-4.6762  -4.6729 

-5.8969 

-5.8712 

-5.8872 

-5.8956 

-5.8657 

-5.8844 

-5.8978 

-5.8636 

-5.8849 

-4.9850 

-4.9465 

-4.9705 

-5.0890  -5.0873 
-5.0462  -5.0402 
-5.0728  -5.0695 

-5.8946 

-5.8647 

-5.8833 

-5.8929 

-5.8587 

-5.8800 

-5.8912 

-5.8527 

-5.8767 

-5.0539 

-5.0111 

-5.0378 

-5.0522  -5.0505 
-5.0052  -4.9992 
-5.0345  -5.0312 

-4.6121 

-4.5778 

-4.5992 

-4 . 6104 
-4.5719 
-4.5959 

-5.9033 

-5.8605 

-5.8872 

-5.9022 

-5.8551 

-5.8844 

-5.8997  -5.9013 
-5.8483  -5.8456 
-5.8803  -5.8803 

-5.8994 

-5.8609 

-5.8849 

-5.9040 

-5.8612 

-5.8878 

-5.9023 

-5.8552 

-5.8845 

-5.9006 

-5.8492 

-5.8812 

-5.8989  -5.8972 
-5.8433  -5.8373 
-5.8779  -5.8746 

-5.8969 

-5.8541 

-5.8808 

-5.8952 

-5.8482 

-5.8775 

-5.8935 

-5.8422 

-5.8742 

-5.1419 

-5.0863 

-5.1209 

-5.1402  -5.1385 
-5.0803  -5.0743 
-5.1176  -5.1143 

100 


101 


(Continued) 


NAR 

NMA 

7 

8 

9 

10 

11 

12 

AIC 

-5.8915 

-5.8911 

-5.8981 

-5.8964 

-5.8932 

-5.8915 

1 

BIC 

-5.8573 

-5.8526 

-5.8553 

-5.8494 

-5.8418 

-5.8358 

PIC 

-5.8786 

-5.8766 

-5.8820 

-5.8787 

-5.8738 

-5.8705 

AIC 

-5.9058 

-4.9586 

-5.8956 

2 

BIC 

-5.8673 

-4.9159 

-5.8485 

PIC 

-5.8913 

-4.9425 

-5.8779 

AIC 

-5.0550 

-5.9012 

-5.9036 

-5.9019 

-5.9003 

-5.8986 

3 

BIC 

-5.0122 

-5.8541 

-5.8523 

-5.8463 

-5.8403 

-5.8344 

PIC 

-5.0389 

-5.8834 

-5.8843 

-5.8810 

-5.8777 

-5.8744 

AIC 

-5.9062 

-5.9040 

-3.5525 

-5.8879 

-5.8863 

-5.8846 

4 

BIC 

-5.8591 

-5.8526 

-3.4969 

-5.8280 

-5.8221 

-5.8161 

PIC 

-5.8884 

-5.8846 

-3.5315 

-5.8654 

-5.8621 

-5.8588 

AIC 

-5.8924 

-5.8907 

-5.8890 

-2.6152 

-2.6135 

-2.6118 

5 

BIC 

-5.8411 

-5.8351 

-5.8291 

-2.5510 

-2.5451 

-2 . 5391 

PIC 

-5.8731 

-5.8698 

-5.8665 

-2.5910 

-2.5877 

-2.5844 

AIC 

-5.8851 

-5.8896 

-4.0352 

-4.0335 

-4.0319 

-4.0302 

6 

BIC 

-5.8295 

-5.8297 

-3.9710 

-3.9651 

-3.9591 

-3.9531 

PIC 

-5.8642 

-5.8670 

-4.0110 

-4 . 0077 

-4 . 0044 

-4.0011 

AIC  : the  Akaike  Information  Criterion 
BIC  : the  Schwarz  Information  Criterion 
PIC  : the  Hannan  Information  Criterion 
NMA  : the  Number  of  Moving  Average  Terms 
NAR  : the  Number  of  Autoregressive  Terms 
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